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Abstract 

Given a large data matrix A £ R raxn , we consider the problem of determining whether its 
entries are i.i.d. with some known marginal distribution A,j ~ or instead A contains a 
principal submatrix Aq^ whose entries have marginal distribution A,j ~ Pi ^ P 0 . As a special 
case, the hidden (or planted) clique problem requires to find a planted clique in an otherwise 
uniformly random graph. 

Assuming unbounded computational resources, this hypothesis testing problem is statisti¬ 
cally solvable provided |Q| > Clogn for a suitable constant C. However, despite substan¬ 
tial effort, no polynomial time algorithm is known that succeeds with high probability when 
|Q| = o(y/n). Recently Meka and Wigderson [MW13bl . proposed a method to establish lower 
bounds within the Sum of Squares (SOS) semidefinite hierarchy. 

Here we consider the degree-4 SOS relaxation, and study the construction of [MW13bl to 
prove that SOS fails unless k > Cn 1 ^/ logn. An argument presented by Barak implies that 
this lower bound cannot be substantially improved unless the witness construction is changed 
in the proof. Our proof uses the moments method to bound the spectrum of a certain random 
association scheme, i.e. a symmetric random matrix whose rows and columns are indexed by 
the edges of an Erdos-Renyi random graph. 
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1 Introduction 

Characterizing the computational complexity of statistical estimation and statistical learning prob¬ 
lems is an outstanding challenge. On one hand, a large part of research in this area focuses on 
the analysis of specific polynomial-time algorithms, thus establishing upper bounds on the problem 
complexity. On the other, information-theoretic techniques are used to derive fundamental limits 
beyond which no algorithm can solve the statistical problem under study. While in some cases 
algorithmic and information-theoretic bounds match, in many other examples a large gap remains 
in which the problem is solvable assuming unbounded resources but simple algorithms fail. The 
hidden clique and hidden submatrix problems are prototypical examples of this category. 

In the hidden submatrix problem, we are given a symmetric data matrix A G M nxn and two 
probability distributions Pq and Pi on the real line, with Ep 0 {X} = 0 and EpjjX} = (i > 0. 
We want to distinguish between two hypotheses (we set by convention A lt = 0 for all i G [n] = 
{ 1 , 2 ,..., n}): 

Hypothesis Hq: The entries of A above the diagonal (A t] ) t< j are i.i.d. random variables with the 
same marginal law A 

ij rsJ Po- 

Hypothesis H\: Given a (hidden) subset Q C [n] the entries (A l j)i < j are independent with 

„ Pi if {i,j} C Q, (1) 

[Po otherwise. 

Further, Q is a uniformly random subset conditional on its size, that is fixed |Q| = k. 

Of interest is also the estimation version of this problem, whereby the special subset Q is known to 
exist, and an algorithm is sought that identifies Q with high probability. 

This model encapsulates the basic computational challenges underlying a number of problems 
in which we need to estimate a matrix that is both sparse and low-rank. Such problems arise 
across genomics, signal processing, social network analysis, and machine learning |SWPN09[ IJL091 
iO.TF+12 . 

The hidden clique (or ‘planted clique’) problem [Jer92] is a special case of the above setting, and 
has attracted considerable interest within theoretical computer science. Let 5 X denote the Dirac 
delta distribution at the point i£l. The hidden clique problem corresponds to the distributions 

Pi = < 5+1 > P ° = 2 ^ +1 + 2 ^ _1 ' ^ 


2 



















In this case, the data matrix A can be interpreted as the adjacency matrix of a graph G over n 
vertices (whereby A y = +1 encodes presence of edge {i, j} in G, and Aij = — 1 its absence). Under 
hypothesis H\, the set Q induces a clique in the (otherwise) random graph G. For the rest of 
this introduction, we shall focus on the hidden clique problem, referring to Section [2] for a formal 
statement of our general results. 

The largest clique in a uniformly random graph has size 2 log 2 n + o(logn), with high probability 
|GM75j . Thus, allowing for exhaustive search, the hidden clique problem can be solved when 
k > (2 + e) log 2 n. On the other hand, despite significant efforts [AKS981IAV111 IDGGPlll IFR10L 
IDM14] . no polynomial time algorithm is known to work when k = o(y/n). As mentioned above, 
this is a prototypical case for which a large gap exists between performances of well-understood 
polynomial-time algorithms, and the ultimate information-theoretic (or statistical) limits. This 
remark motivated an ongoing quest for computational lower bounds. 

Finding the maximum clique in a graph is a classical NP-hard problem |Kar72| . Even a very 
rough approximation to its size is hard to find |Has96l [KhoOl] , In particular, it is hard to detect 
the presence of a clique of size n 1-£ in a graph with n vertices. 

Unfortunately, worst-case reductions do not imply computational lower bounds for distributions 
of random instances dictated by natural statistical models. Over the last two years there have been 
fascinating advances in crafting careful reductions that preserve the instances distribution in specific 
cases |BR13l IMW13al IGX141IHWX141 IGLR15| . This line of work typically establishes that several 
detection problems (sparse PCA, hidden submatrix, hidden community) are at least as hard as the 
hidden clique problem with k = o(y/n). This approach has two limitations: 

(i) It yields conditional statements relying on the unproven assumption that the hidden clique 
problem is hard. In absence of any ‘completeness’ result, this is a strong assumption that 
calls for further scrutiny. 

(ii) Reductions among instance distributions are somewhat fragile with respect changes in the 
distribution. For instance, it is not known whether the hidden submatrix problem with 
Gaussian distributions Pq = N(0,1) and Pi = N(/t, 1) is at least as hard as the hidden clique 
problem, although a superficial look might suggest that they are very similar. 

A complementary line of attack consists in proving unconditional lower bounds for broad classes 
of algorithms. In an early contribution, Jerrum |Jer92| established such a lower bound for a class 
of Markov Chain Monte Carlo methods. Feldman et al. (FGR+12) considered a query-based 
formulation of the problem and proved a similar result for ‘statistical algorithms.’ Closer to the 
present paper is the work of Feige and Krauthgamer |FK00] . who analyzed the Lovasz-Schrijver 
semidefinite programming (SDP) hierarchy. Remarkably, these authors proved that r rounds of 
this hierarchy (with complexity n°^) fail to detect the hidden clique unless k > y/n/2 r . (Here 
and below we write /(n, r,.. .) > g(n, r,... ) if there exists a constant C such that f(n, r,. ..) > 
Cg(n,r ,...).) 

While this failure of the Lovasz-Schrijver hierarchy provides insightful evidence towards the 
hardness of the hidden-clique problem, an even stronger indication could be obtained by establish¬ 
ing an analogous result for the Sum of Squares (SOS) hierarchy [Sho87i ILasOli |Par03j . This SDP 
hierarchy unifies most convex relaxations developed for this and similar problems. Its close connec¬ 
tion with the unique games conjecture has led to the idea that SOS might indeed be an ‘optimal’ 
algorithm for a broad class of problems PS 11! . Finally, many of the low-rank estimation problems 
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mentioned above include naturally quadratic constraints, that are most naturally expressed within 
the SOS hierarchy. 

The SOS hierarchy is formulated in terms of polynomial optimization problems. The level of 
a relaxation in the hierarchy corresponds to the largest degree d of any monomial whose value is 
explicitly treated as a decision variable. Meka and Wigderson |MW13b] proposed a construction of 
a sequence of feasible solutions, or witnesses (one for each degree d), that can be used to prove lower 
bounds for the hidden clique problem within the SOS hierarchy. The key technical step consisted 
in proving that a certain moment matrix is positive semidefinite: unfortunately this part of their 
proof contained a fatal flaw. 

In the present paper we undertake the more modest task of analyzing the Meka-Wigderson 
witness for the level d = 4 of the SOS hierarchy. This is the first level at which the SOS hierarchy 
differs substantially from the baseline spectral algorithm of Alon, Krivelevich and Sudakov |AKS98j . 
or from the Lovasz-Schrijver hierarchy. We prove that this relaxation fails unless 


n 1 / 3 

logn 


(3) 


Notice that the natural guess would be that the SOS hierarchy fails (for any bounded d) whenever 
k = o(^/n). While our result falls short of establishing? this, an arerument oresented in |Bar n 
shows that this is a limitation of the Meka-Wigderson construction. In other words, by refining our 
analysis it is impossible to improve the bound Q except -possibly- by removing the logarithmic 
factor. 

Apart from the lower bound on the hidden clique problem, our analysis provides two additional 
sets of results: 


• We apply a similar witness construction to the hidden submatrix problem with entries distri¬ 
butions Po = N(0, 1), Pi = N (/u, 1). We define a polynomial-time computable statistical test 
that is based on a degree-4 SOS relaxation of a nearly optimal combinatorial test. We show 
that this fails unless k > ^ -1 n 1//3 /logn. 

• As mentioned above, the main technical contribution consists in proving that a certain random 
matrix is (with high probability) positive semidefinite. Abstractly, the random matrix in 
question is function of an underlying (Erdos-Renyi) random graph G over n vertices. The 
matrix has rows/columns indexed by subsets of size at most d/2 = 2, and elements depending 
by the subgraphs of G induced by those subsets. We shall loosely refer to this type of random 
matrix as to a random association scheme. 

In order to prove that this witness is positive semidefinite, we decompose the linear space on 
which it acts into irreducible representation of the group of permutations over n objects. We 
then use the moment method to characterize each submatrix defined by this decomposition, 
and paste together the results to obtain our final condition for positivity. 

We believe that both the matrix definition and the proof technique are so natural that they 
are likely to be useful in related problems. 

• As an illustration of the last point, our analysis covers the case of Erdos-Renyi graphs with 
sublinear average degree (namely, with average degree of order n 1_a , a < 1/12). In particular, 
it is easy to derive sum-of-squares lower bounds for finding cliques in such graphs. 
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The rest of the paper is organized as follows. In Section [2] we state our main technical result, 
which concerns the spectrum of random association schemes. We then show that it implies lower 
bounds for the hidden clique and hidden submatrix problem. Section [3] presents a brief outline of 
the proof. Finally, Section [4] presents the proof of our main technical result. 

While this paper was being written, we became aware through |Barl4| that -in still unpublished 
work- Meka, Potechin and Wigderson proved that the degree-d SOS relaxation is unsuccessful unless 
k > n~ 1//d . It would be interesting to compare the proof techniques. 


2 Main results 


In this section we present our results. Subsection |2.1| introduces a feasible random association 
scheme that is a slight generalization of the witness developed in [MW13b] (for the degree d = 4 
SOS). We state conditions implying that this matrix is positive semidefinite with high probability. 
These conditions are in fact obtained by specializing a more general result stated in Proposition 
4.1 We then derive implications for hidden cliques and hidden submatrices. 


2.1 Positivity of the Meka-Wigderson witness 

We will denote by G(n,p ) the undirected Erdos-Renyi random graph model on n vertices, with 
edge probability p. A graph G = (V. E) ~ G(n,p) has vertex set V =[n} = { 1, 2,..., n}, and edges 
set E defined by letting, for each i < j E [n], {i, j} E E independently with probability p. 

The random association scheme M = M(G,a) can be thought as a generalization of the adja¬ 
cency matrix of G, depending on the graph G and parameters a = (aq, « 3 , aq) E R 4 - In order 
to define the matrix M we first require to set up some notation. For an integer r, we let (^) denote 
the set of all subsets of [n] of size exactly r, and denote the set of all subsets of size at most 
r. We also let 0 denote the empty set. 

We shall often identify the collections of subsets of size one, (^) = {{i} : i E [n]} with [n]. 
Also, we identify (^) with the set of ordered pairs {(i, j) : i,j E [n],i < j}. If A = {i, j} with i < j 
we call i ( j) the head (respectively, tail) of A denoted by h{A ) (respectively, t(A)). 

Given the graph G and a set A C [n], we let Ga denote the subgraph of G induced by A. We 
define the indicator Qa 


Ga 


1 if Ga is a clique, 
0 otherwise. 


(4) 


For convenience of notation we let Gij = G{ij} and gA = Ga — IE{ Ga} be the centered versions of 
the variables Gij■ We also set ga = 0. 

[ Pi j x ( 1"1 j 

We can now define the matrix M = M(G,a ) E '-^ 22 as follows. For any pair of sets 

A,BE (j^) we have: 


Ma,b = o:\aub\Gaub , 


(5) 


with ao = 1. 
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Theorem 1. Suppose a, p satisfy: 


a\ = k , 



a 3 


K 3 
P 3 ’ 


CK4 



p > c(ftlogn) 1//4 n 1,/6 , 


( 6 ) 


for some k € [log n/n,n 2 / 3 /log n] and c a large enough absolute constant. If G ~ G(n,p ) is a 
random graph with edge probability p then, for every n large enough, 


P {M(G,a) b 0} > 1- -. 


(7) 


The proof of this theorem can be found in Section [4j As mentioned above, a more general set 

The proof 


of conditions that imply M(G,a ) b 0 with high probability is given in Proposition 4.1 


of Theorem [I] consists in checking that the conditions of Proposition |4.1| hold and deriving the 
consequences. 


2.2 A Sum of Squares lower bound for Hidden Clique 

We denote by G (n,p,k) hidden clique model, i.e. the distribution over graphs G = (V,E), with 
vertex set V = [n], a subset Q C [n] of k uniformly random vertices forming a clique, and every 
other edge present independently with probability p. 

The SOS relaxation of deeree d = 4 for the maximum clique problem ITul()9. Bar ni is a 

semidennite program, whose decision variable is a matrix X G R v ^ 2 ' ^< 2 ': 

maximize E X {i},{i} > ( 8 ) 

ie[n] 

subject to: X b 0, X Sl ,s 2 € [0,1], 

x Si,S 2 = 0 when Si U S 2 is not a clique in G , 
x Si,S 2 = x S 3 ,S 4 f° r Si U S 2 = S 3 U S 4 , 
x 9,l = 1 - 


Denote by Val(G; d = 4) the value of this optimization problem for graph G (which is obviously an 
upper bound on the size of the maximum clique in G). We can then try to detect the clique (i.e. 
distinguish hypothesis Hi and Hq defined in the introduction), by using the test statistics 


no) 


0 if Val(G; 4) < c*fc, 
1 if Val(G;4) > c*fc. 


(9) 


with c* a numerical constant. The rationale for this test is as follows: if we replace Val(G;4) by 
the size of the largest clique, then the above test is essentially optimal, i.e. detects the clique with 
high probability as soon as k > logn (with c* = 1). 

We then have the following immediate consequence of Theorem [l] 

Corollary 2.1. Suppose G ~ G(n, 1/2). Then, with probability at least 1 — n _1 , the degree-4 SOS 
relaxation has value 

n 1 / 3 

Val(G;4)>—. (10) 

logn 
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Proof. Consider M(a, G ) from Theorem [l] (with p = 1/2). For M(a, G ) to be positive semidefinite 
with high probability, we set k = con~ 2 ^/ \ogn for some absolute constant co- It is easy to check 
that M(a, G) is a feasible point for the optimization problem (|8). Recalling that ATp},{j} = ol\ = n, 
we conclude that the objective function at this point is tik = con 1 / 3 / log n, and the claim follows. □ 


We are now in position to derive a formal lower bound on the test <§• 

Theorem 2. The degree-4 Sum-of-Squares test for the maximum clique problem, defined in Eq. 0. 
fails to distinguish between G ~ G(n,&, 1/2) and G ~ G(n, 1/2) with high probability if k < 
n 1 / 3 / log n. 

In particular, T{G) = 1 with high probability both for G ~ G(n, k, 1/2), and for G ~ G(n, 1/2). 


Proof of Theorem^ Assume k < c\n 1 ^ 3 / log n for c\ a sufficiently small constant. For G ~ 
G(n, 1/2), Corollary 2.1 immediately implies that Val(G;4) > c*k, with high probability. 

For G ~ G(n, A;, 1/2), we obviously have Val(G;4) > k (because SOS gives a relaxation). To 
obtain a larger lower bound, recall that Q C [n] indicates the vertices in the clique. The subgraph 
Gqc induced by the set of vertices Q c = [n]\Q is distributed as G(n — k, 1/2). Further, we obviously 
have 


Val(G;4) > Val(G Q c;4). 


( 11 ) 


Indeed w e can always set to 0 variables indexed by sets A C [n] with A (7 Q c . Hence, applying again 
Corollary 2.1 we deduce that, with probability 1 — (n — A)” 1 , Val(G; 4) > C(n — k) 1 ^ 3 / log(n — k ), 
which is larger than c*A\ Hence T(G) = 1 with high probability. Ipf' 


2.3 A Sum of Squares lower bound for Hidden Submatrix 

As mentioned in the introduction, in the hidden submatrix problem we are given a matrix A € M. nxn , 
which is generated according with either hypothesis Ho or hypothesis H\ defined there. To avoid 
unnecessary technical complications, we shall consider distributions Po = N(0,1) (for all the entries 
in A under Ho) and Pi = N(/q 1) (for the entries Aij, i, j E Q under Pi) . 

In order to motivate our definition of an SOS-based statistical test, we begin by introducing a 
nearly-optimal combinatorial test, call it T com b- This test essentially look for a principal submatrix 
of A of dimension k, with average value larger than p/2. Formally 


'1 

Tcomb (A) = < 


10 


if 3x G {0, l} n such that Ylieln] Xi — anc ^ 
and > l (2) A 

otherwise. 


( 12 ) 


A straightforward union-bound calculation shows that P CO mb (■) succeeds with high probability 
provided k > log n. 

As in the previous section, the degree-4 SOS relaxation of the set of binary vectors x G {0, l} n 
consists in the following convex set of matrices 

C 4 (n) = {A G m(< 1) x (<D : X A 0, X Sl ,s 2 G [0,1], A 0>0 = 1, 

X Sl ,s 2 = X S3 ,s 4 for all Si U 5 2 = S 3 U S 4 } . (13) 
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This suggests the following relaxation of the test T com b( •): 


T{A) 


1 if there exists X E Ci(n) such that X^iefn] and 

k 0 otherwise. 


(14) 


We begin by stating a corollary of Theorem [T} 


Corollary 2.2. Assume A is distributed according to hypothesis Ho, i.e. Aij ~ N(0,1) for all 
i, j E [n]. Then, with probability at least 1 — 2 n~ l , there exists X E C^n) such that 


Y. x 


{<},{<} ~ 


n 


1/3 


i£n 


logn 


E 

i,je[n],i<j 




n 


2/3 


(log n ) 


(15) 


Proof. Fix A a sufficiently large constant and let G be graph with adjacency matrix Q given by 
Qij = l{Aij > A). Note that this is an Erdos-Renyi random graph G ~ G(n,p) with edge probability 
p = $(—A). (Throughout this proof, we let (f>(z) = e~ z Z 2 /a/^tt denote the Gaussian density, and 
$(z) = ff </>(t) dt the Gaussian distribution function.) 

We choose X = M(G,a) a random association scheme, where a is set according to Theorem [I] 
with 


k = 


C2 

n 2 / 3 log n ’ 


(16) 


with c a suitably small constant. This ensures that the conditions of Theorem [l] are satisfied, 
whence X E C^n) with high probability. Further, by definition 


^ ~ nK ~ 

iG[n] 


C2 n 1 / 3 
logn 


(17) 


It remains to check that the second inequality in (15) hold. We have 

E 2k 2 ^—v 

A U X {i\.{j\ = — >_ Aijyij ■ 

i,je[n],i<j i,j£[n],i<j 


Note that 


E{ J] = Q ^{^12 1(^12 > A)} = Q . 


i,]£[n\,K] 


(18) 


(19) 


Note that the random variables are independent and subgaussian. By a standard 

concentration-of-measure argument we have, with probability at least 1 — n -2 , for a suitably small 
constant d, Yh<j A ij@ij — dn 2 4>( A) and hence 




n 


2/3 


(logn) 2 


( 20 ) 


8 


10 









Theorem 3. Consider the Hidden Submatrix problem with entries’ distributions Pq = N(0,1), and 

Pi = N(/i,l). 


Then, the degree-4 Sum-of-Squares test defined in Eq. (14), fails to distinguish between hypothe¬ 
ses Hq and H\ if k < / u _ 1 n 1 // 3 /logn. In particular, T(A) = 1 with high probability both under Ho 
and under H\. 


Proof. First consider A distributed according to hypothesis Hq. Note that, if Xq E C^(n) and 
s E [0,1] is a scaling factor, then sXq E C 4 . Therefore (by choosing s = ckn -1 / 3 logn for a suitable 


constant c) Corollary 2.2 implies that with high probability there exists X E £ 4 ( 71 ) such that 


J2 X iiUA - k > 


iS n 


E 




k n 1 / 3 
logn 


( 21 ) 


Therefore, for p,k < cn 1 / 3 /logn with c a sufficiently small constant, we have ■ A^X^y^j} > 
c*p,k 2 and therefore T(A) = 1 with high probability. 

Consider next A distributed according to hypothesis H\. Note that A = /uIqIq + A, where 
1 q is the indicator vector of set Q, and A is distributed according to Ho. Since 
is increasing in A, we also have that T(A ) = 1 implies T(A ) = 1. As shown above, for fik < 
cnV 3 /logn, we h ave T(A) = 1 with high probability, and hence T(A) = 1. □ 


3 Further definitions and proof strategy 

/ [n] \ 

In order to prove M(G, a) P 0, we will actually study a new matrix N(G, a) E defined 

as follows: 


Na,b = at \aub\ Gij- 

i€A\B,jeB\A 


( 22 ) 


Notice that Ma,b = A t a,bGaGb, he. M is obtained from N by zeroing columns (rows) indexed by 
sets A, B that do not induce cliques in G. Thus, N P 0 implies M P 0. 

We also define the matrix H E Rw 1 2 )) x (( 1 ) u ( 2 )) that is the Schur complement of N 

with respect to entry iVgg = 1. Formally: 


Ha,b = A t a,b - aL\A\u\B\ , (23) 

where, as before, we define ao = 1. Furthermore we denote by H a jj, for a, b E {1, 2}, the restriction 
of H to rows indexed by (^) and columns indexed by (^). (This abuse of notation will not be 
a source of confusion in what follows, since we will always use explicit values in { 1 , 2 } for the 
subscripts a, b. ) 

Since H is the Schur complement of N, H P 0 implies N P 0 and hence M P 0. The next 
section is devoted to prove HP 0 : here we sketch the main ingredients. 

Technically, we control the spectrum of H by first computing eigenvalues and eigenspaces of 
its expectation E H and then controlling the random part H — E H by the moment method, i.e. 
computing moments of the form KTr{(H — KH) 2m }. The key challenge is that the simple triangular 
inequality P A m i n (E H) — \\H—KH H2 is too weak for proving the desired result. We instead 
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decompose H in its blocks H\ t i, H i j2 , H 2>2 and prove the inequalities stated in Proposition 4.1, cf. 
Eqs. (55) to (57). Briefly, these allow us to conclude that: 

Ha 


'i,i b 0. 

H 2 ,2 b h1. 2 H-\Ha, 2 , 


(24) 

(25) 


which are the Schur complement conditions guaranteeing H b 0. While characterizing H\ \ is 
relatively easy (indeed this block is essentially the adjacency matrix of G ), the most challenging 
part of the proof consists in showing a sufficient condition for Eq. (25) (see Eq. (57) below). In 
order to prove this bound, we need to decompose i 7 2 ,2 and H \ i2 along the eigenspaces of E H 2 2 , 
and carefully control each of the corresponding sub-blocks. 

In the rest of this section we demonstrate the essentials of our strategy to show the weaker 
assertion H 2)2 b 0. We will assume that p is order one, for concreteness p = 1/2 which corresponds 
to the hidden clique problem. It suffices to show that 

Ei72,2 b Ei72,2 — H 2 2 . (26) 

The expected value Ei/ 2,2 has 3 distinct eigenspaces Vo, Vi,V 2 that form an orthogonal decompo¬ 
sition of R(a). Crucially, these spaces admit a simple description as follows: 

Vo = {r 6 : 3u G M s.t. v^jy = u for all i < j} , (27) 

Vi = {u G M^ 1 ) : 3u G M n , s.t. (l n , u) = 0 and v {i,j} = Ui + Uj for all i < j} , (28) 

V 2 = (V 0 ®Vi) ± . (29) 

If V a is the orthogonal projector onto V a we have that Ei? 2,2 = A 0 P 0 + + X 2 V 2 where Ao ~ 

n 2 fi: , Ai ~ nn 3 and A 2 ~ k 2 (see Proposition 4.16 for a formal statement). 

Now, consider the entry indexed by {i,j},{k,£} G (^) : 

(# 2,2 ){i,j},{k,£} = ~ a 2 + OL^QikQmQjkQjl (30) 

= -a\ + a A {p + g ik ){p + ga)(jp + gjk){p + 9 ji) (31) 

= -a\ + a A p 4 + a A p 3 (g ik + gu + gjk + gje) 

T (*4p- (gi.k flip. T 9ik9jk9jk9jC T 9i£9j£ T 9ik9j£ T 9i£9jk ) 

T 04 p( 9 i, /,: 9i.£9jk T 9ik9jk9j£ T 9ik9i£9j£ T 9it9jk9j £) 4“ a 49ij9i£9jk9j£■ (32) 

The decomposition Eq. (|32[) holds only when {i,j} and {k,£} are disjoint. Since the number of 
pairs that intersect are at most n 3 <C n 4 , it is natural to conjecture that these pairs 

are negligible, and in this outline we shall indeed assume that this is true (the complete proof 
deals with these pairs as well). The random portion E 77 2 ,2 — H 2 - 2 involves the last 15 terms of the 
above decomposition. Each term is indexed by a pair ( 77 , v) where 1 < g < 4 denotes the number 
of gij variables in the term and 1 < u < ( ) the exact choice of 77 (out of 4) variables used. In 

accordance with notation used in the proof, we let J VtU denote the matrix with {i,j},{k,l} entry 
is the ( 77 , 12 ) entry in the decomposition Eq. (32). See Table [l] and Eq. (178) for a formal definition 
of the matrices J v ,v Hence we obtain (the ~ below is due to the intersecting pairs, which we have 
ignored): 

H 2 , 2 - EH 2:2 « J2 J2 (33) 
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We are therefore left with the task of proving 


eh 2 , 2 hQ = - EEh 

77 V 


Viewed in the decomposition given by Vq,Vi, V 2 , Eq. (34) is satisfied if: 


/A 0 0 0 \ 

0 Ai 0 V 

\0 0 A 2 / 


/W'PoQVoh \\V 0 QVi \\ 2 \\V 0 QVo\\ 2 \ 
\\ViQVo \\ 2 \\ViQPi \\ 2 \\V1QP2h 
\WV2QV0h \\V 2 QVi \\ 2 \\V 2 QV 2 \\J 


(34) 


(35) 


The bulk of the proof is devoted to developing operator norm bounds for the matrices V a J VtL ,Vb 
that hold with high probability. We then bound V a QVb using triangle inequality 

\\VaQPb\\ 2 <52\\'PaJT l ,v'Pb\\ 2 - (36) 

r h 1 ' 


The matrices J 4 j i, J 3 ,u, J 2 ,i, Jifi turn out to have an approximate “Wigner”-like behavior, in 
the following sense. Note that these are symmetric matrices of size (!)) ~ n 2 /2 with random 
zero-mean entries bounded by 0 : 4 . If their entries were independent, they would have operator 
norms of order a 4 Wn 2 /2 ~ «: 4 n (FK 8 l| . Although the entries are actually not independent, the 
conclusion still holds for J 4) i, Jz,vi J 2 ,i, <^ 2,6 and they have operator norms of order n A n. Hence 
\\'P a Jri,v'Pb \\2 < ||'Ay,!/ 1|2 ~ for these cases. 

We are now left with the cases (Ji :U )i<u <4 and (J 2 ,u) 2 <u< 5 - These require more care, since their 
typical norms are significantly larger than n. For instance consider where 


(Ji,v){i,j},{k,e} ~ 9ik ■ (37) 

Viewed as a matrix in M n xn , J\ yU corresponds to the matrix a^g < 8 > (l n l n ) where ® denotes the 
standard Kronecker product and g E M nxn is the matrix with (i, j) entry being gij. By standard 
results on Wiener random matrices |FK81| . \\9W2 ~ y/n with high probability. Hence: 


g ® lnln 


1 1 
L n J-n 


< n 3 / 2 - 


(38) 


with high probability. This suggests that ||Jii ,|| 2 ;$ ct 4 n 3//2 ~ k 4 ti 3 / 2 with high probability. This 
turns out to be the correct order for all the matrices and J 2)I/ under consideration. 

This heuristic calculation shows the need to be careful with these terms. Indeed, a naive 
application of this results yields that \\VaQVb\\ 2 < k 4 u 3//2 . Recalling Eq. (35), this imposes that 
Ao K 4 n 3//2 . Since we have A 2 ~ k 2 , we obtain the condition n <C n -3 / 4 . The parameter k turns 
out to be related to the size of the planted clique through k ~ nn. Hence this argument can only 
prove that the SOS hierarchy fails to detect hidden cliques of size k <C n 1 / 4 . 

In order to improve over this, and establish Theorem [l] we prove that matrices J\ >u and J 2tl/ 
satisfy certain spectral properties with respect to the subspaces Vo>Vi,V 2 . For instance consider 
the sum J 2 ^ + J 2 ^ ■ For any dGrW 


(J2,3V + J 2 ,5v){i,j} = ^ ~2p 2 {9ik9il + 9jkgji)v{k, 1 } 

k<e 

= Ui + Uj, 


(39) 

(40) 
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where we let Ui == Y^k<iP 2 ^ 9 ik 9 it) v {k/}- It follows that ( J 2 , 3 V + J 2 , 5 )v G Vo © Vi hence V 2 (>h ,3 + 
^ 2 , 5 ) = 0. By taking transposes we obtain that (J 23 + < ^ 2 , 4)^ 2 = 0. In a similar fashion we obtain 
that V 2 Ji,v) = (Siy Ji,v)V 2 = 0. See Lemmas 
Using these observations and Eq. 


4.23 


4.24 


for formal statements and proofs, 
we obtain that H^Q^II ^ f< 4 n, while for any other pair 


< k a U 


(a, 6) G {0,1, 2} 2 we have that ||'P a <3’P&|| 
and Ai ~ k 2 whence the condition in Eq. (35) reduces to: 


3 / 2 . As noted before, since Aq 


n 2 K 4 , 


Ai 


UK 


/n 2 n A 0 0\ /n 3 / 2 n 3 / 2 n 3 / 2 \ 

I 0 nK 3 0 I — k a I n 3 / 2 n 3 / 2 n 3 / 2 I ^ 0. (41) 

\ 0 0 k 2 ) \n 3 / 2 n 3 / 2 n ) 

The 2,2 entry of this matrix inequality yields that k 2 — K A n 0 or k <S n -1 / 2 . Considering the 
(1,1) entry yields a similar condition. The key condition is that corresponding to the minor indexed 
by rows (and columns) 1,2: 


( nn 3 
n 3 / 2 K 4 


~ n T) *» 


(42) 


This requires that nn 5 n 3 n 8 or, equivalently k <C rC 2 ^ 3 . Translating this to clique size k = uk, we 
obtain the condition k <C n 1 / 3 . This calculation thus demonstrates the origin of the threshold of n 1 / 3 
beyond which the Meka-Wigderson witness fails to be positive semidefinite. The counterexample 
of |BS14] shows that our estimates are fairly tight (indeed, up to a logarithmic factor). 


4 Proofs 

4.1 Definitions and notations 

Throughout the proof we denote the identity matrix in m dimensions by I m , and the all-ones vector 
by l m . We let Q n = l n l^/n be the projector onto the all ones vector l n , and Q^ = I n — Q n its 
orthogonal complement. 

The indicator function of property A is denoted by 1(A). The set of first m integers is denoted 

by [m] = {1,2 ,..., m}. 

As mentioned above, we write f(n, r,...) > g(n, r,...) if there exists a constant C such that 
f(n, r,...) > C g(n, r,...). Similarly we write f(n, r,...) S> g(n, r,...) if, for any constant C, we 
have /(n, r,...) > C g(n,r,...) for all n large enough. These conditions are always understood 
to hold uniformly with respect to the extra arguments r ,..., provided these belong to a range 
depending on n, that will be clear from the context. 

We finally use the shorthand h = n log n. 


4.2 Main technical result and proof of Theorem [l] 

The key proposition is the following which controls the matrices H a ^. A set of conditions for the 
parameters a is stated in terms of two matrices W, W £ M 3x3 . Below we will develop approxima¬ 
tions to these matrices, under the parameter values of Theorem [l] This allows to check easily the 


conditions of Proposition 4.1 
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Proposition 4.1. Consider the symmetric matrices W, W E R 3x3 , where W is diagonal, and given 
by: 


— (n — 2)(n — 3) 4 n(n — 1) 2 

Wqo = a 2 + 2(n — 2 )o 3 /7 H----^- 02 , 

Wn = o 2 + (n - 4 )o 3 /7 — (n — 3)o 4 / 7 4 , 

TP22 = «2 — 2 a 3 /? + a 4 p 4 , 

and W is defined by: 

w m = Ca 3 n^ + Cat*** + + (» 3/ W + + C« 3 <V) 2 , 

a\ n[a2P — af) 

C — 

Woi = Cash 1 / 2 + ( 7 o 4 ? 7 3 / 2 H-(o 3 77 )(< 7 o 3 n + \/na 2 ) 

a 4 

H- 7 -ov( n}/ 2 asp 2 + 2^/na2 + < 7 o 3 77 )( 3 o 3 77 ), 


= \2 


n(o 2 /7 - a?) 

W 02 = Cash}/ 2 + <7o 4 ?! 3 / 2 + ^i 03 ™) 
C 


a 1 


+ 


77 ( 02/7 - o 2 ) 


n}/ 2 asp 2 + 2 y / na !2 + <7a 3 ?7^ (o 3 n) , 


W 11 = Casr}/ 2 + Cain 3 / 2 + — (Ca 3 n + i/™^) 2 H- ^( a 3 r 0 ^ 

a 4 n(a 2 p — of) 

W12 = Cash 1 / 2 + Caih 3 / 2 + — ( a3h)(Ca3h + -v/na 2 ) H- ^( a3 ^) ; 

ai 77(02/7 — ay 

_ _!/ 2 „ _ < 7 ( 0377.) 2 <7(o 3 n) 2 

IP22 — <70377 ' + < 7 o 4 ?7 H- 1 - 


Ol 77(02/7 — of) 
Assume the following conditions hold for a suitable constant C: 

a 1 > 202/7 + 2a277 1//2 , 
a 2 p 2 > of , 

T/ien with probability exceeding 1 — n _1 all of the following are true: 

Hu h 0, 


#n =< 


1 


77(02/7 — of) 


Qn + Q n , 


Ol 


^22 y —h 1 2 q}h 12 + 


Ol 


n(a 2 /7 - of ) 


Hj 2 Q n H 12 . 


(43) 

(44) 

(45) 


(46) 


(47) 


(48) 

(49) 

(50) 

(51) 


(52) 

(53) 

(54) 


(55) 

(56) 

(57) 


The next two lemmas develop simplified expressions for matrices W, W under the parameter 
choices of Theorem [l] 
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Lemma 4.2. Setting ( a,p ) as in Theorem [7| there exists 5 n = 5 n {n,p) with 5 n (n,p) —>• 0 as n ^ oo, 
such that 


TT7 2n2ft4 

W 00 - 9 — 

pZ 
Uk3 

W n-5- 

P Z 

w 2k2 

W 22 - 

p 


< <5 n fT()0 ; 

< <5 n Wn , 

< W 22 . 


(58) 

(59) 

(60) 


Lemma 4.3. Setting ( a,p) as in Theorem [7J there exists 5 n = 5 n {n,p) with 5 n (n,p) —> 0 as n —> 00 , 
such that, for some absolute constant C, 


n 2 K A 


Woo- 2 

pZ 

W \, -c hn 


w 22 - C- 


K^\fn 


-C 


p u 

K 5 h 2 


pO 


pv 


< 5 n Woo, 

< S n W n , 

< SnW 22 , 


(61) 

(62) 

(63) 


and, for every a / 6 £ {0,1,2}, 


W ab -C 


K 4 n 3 / 2 


pu 


S n W ab , 


(64) 


With Proposition 4.1 and the auxiliary Lemmas 4.3, 4.2 in hand, the proof of Theorem [l] is 
straightforward. 

Proof of Theorem^ 7} As noted in Section [3] it suffices to prove that H y 0. By taking the Schur 
complement with respect to Hu, we obtain that H y 0 if and only if 


H n h 0 and H 22 y Hj 2 Hf^H 12 


(65) 


Suppose that the conditions of Proposition 4. 1| are verified under the values of a, p specified as in 
Theorem [I] Then we have Hu y 0 by Eq. (55[). Further by Eqs. (56) and (57), we have 


H 22 y H 


12 


oi\ 


Qn + 


n(a 2 p — a\) 


Qn H V2 


y hJ 2 h- 1 h V2 , 


( 66 ) 

(67) 


which yields the desired (65). 

We are now left to verify the conditions of Proposition [ tT] To begin, we verify that aq > 2 a 2 p+ 
2 a 2 h 1 / 2 . This condition is satisfied if: 


p 3> nn 1 / 2 


( 68 ) 
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For this, it suffices that 


(69) 

(70) 

Since k < n -2 /' 3 , this is true. 

The condition a 2 P — aj > 0 holds since a^p — a 2 = 2k 2 — k 2 = k 2 > 0. 

It remains to check that W >z W. By Sylvester’s criterion, we need to verify that: 


(Klogn ) 1 / 4 ?! 1 / 6 Kn 1 / 2 . 

or k < n _4 / 9 (logn) _1//3 . 




Woo — Woo > 

0 , 

(71) 


Woo — Woo 
-Woi 

-Woi 
Wn - Wn 

> 

0 , 

(72) 

Woo - Woo 

-Woi 

-W 02 




-Woi 

Wn - Wn 

_-Wn 

> 

0 . 

(73) 

-W 02 

-W 12 

W 2 2 — W 22 





It suffices to check the above values using the simplifications provided by Lemmas 4.2 and 4.3 
respectively as follows. Throughout, we will assume that n is large enough, and write 5 n for a generic 
sequence such that 5 n —> 0 uniformly over k G [logn/n, c _4 n“ 2/,3 /logn], p G [c(Klogn) 1//4 n 4 / 6 ,1]. 
For Eq. using Lemmas |4.2| and |4.3| we have that: 


Woo - Woo > 


n 2 K A 


2 ’ 


2 p- 


(74) 


Hence , Woo ~ Woo > n 2 K A /2p 2 > 0 for large enough n. 

For Eq. ( f72| to hold we need: 

(Woo - Woo)(W n - Wn) - W 2 ! > 0. 


By Lemmas 4.2 and 4.3 we have: 


nK" ** \ CK A r?! 2 


Wn-Wn > —2-(l -5 n )~ 

pZ 


pU 


(1 + S n ). 


The ratio of the two terms above is (up to a constant) given by p 4 /(«:?r 1 ^ 2 (logn ) 3/2 > 
for n large enough we have W\\ — W\\ > nn s /2p 2 . Thus Eq. (72) holds if 


?z 2 k 4 


P~ 


UK" \ ( At 4 n 3 / 2 

» - a 


P 


P “ 


or p 8 S> «(log n) 


(75) 

(76) 
oo, hence 

(77) 

(78) 


However as we set p > (/vlogn) 1//4 n 1,/6 , this is satisfied for n large. Indeed this implies that: 


W oo — Woo 

-Woi 


Woi 

Wn - Wn 


S 7 

n k 
- ~ 2 pA 


(79) 
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Consider now Eq. (|73|). Expanding the determinant along the third column 

(W 22 ~ W 22 ) 


Woo — Woo ~Wo\ 
-W 01 Wu - Wu 


+ W 12 


Woo —Woo —W01 
—W 02 — W\ 2 


-hh 02 


—Wq\ Wu-Wu 
— W 02 —W 12 


We start by noting that, for all n large enough, 


_ 3^2 

W22 — W22 > - w —• 

2 p 


> 0 . 

(80) 

(81) 


Indeed, by Lemma 4.2 and 4.3 to prove this claim it is sufficient to show that 

'2 


> C 


P 


n 5 n 2 k 3 ^, 1 / 2 

+ 


P 


,6 


p° 


for a large enough constant C or: 

p > C max (n 2 / 5 k 3 / 5 (log n) 2//5 , K ly/2 (?r log n) 1 / 4 ) 


(82) 


(83) 


This is satisfied when we choose p > c(fi:logn) 1 ,/ 4 n 1//6 when we choose c a large enough constant. 
Along with the argument for the second condition above, this implies that: 


(W 22 ~ W 22 ) 


Wqo ~ Wqo —W01 
-W 01 fn - W n 


Q Q 

n k, 


9 rfi 


2 p 5 ’ 


(84) 


for large enough n. 

We now consider the second term. Let w = C/v 4 h 3 /, 2 /p 6 . Then by Lemmas 
n large enough: 


4.2 


and 


4.3 


0 < —11 12 


Woo ~ Woo —W01 
—W 02 —W 12 


3 2 

< -W 


n 2 K 4 


pz 


+ w 


< 


2 n 2 K A w 2 


p 


,2 ’ 


for all 

(85) 

( 86 ) 


as n 2 ft 4 /p 2 > 2w whenever p > (log n) 3 ^ 8 n 1 / 8 . As we have p>n 1 Z 12 this is satisfied. 
Similarly, for the third term 


0 < W 02 


-W 01 Wu-Wu 
—W 02 —W 12 


3 w 2 
< —— w + 


nfC 


(87) 


The second term in the parentheses above dominates when p > K 1 / 4 (logn) 3 // 8 n 1//8 which holds as 
we keep p > c(«:logn) 1 / 4 n 1 / 6 . Hence: 


W 02 


—W 01 Wu-Wu 
— W 02 —W 12 


< 


2 nn 6 w 


3„,,2 




( 88 ) 
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Thus, using Eqs. (84), (86), (88), we conclude that Eq. (73) holds if 


n 3 At 9 


2 p 


,5 - 


2 n 2 K 4 ui 2 
> - o -f 


For this, it suffices that: 


2(1 + nn)riK 6 w 

p 2 


R q 2 4 2 

n k, > n k w 


2 nn 3 w 2 

p- 2 
3„, ,2 


(89) 

(90) 

(91) 

p° p* 

or, equivalently, p 9 > cin 2 /-c 3 (logn) 3 for an appropriate ci large enough. This holds under the 
stated condition p > c(/-clog?r) 1,/4 n 1//6 provided c is large enough. This completes the proof of 
Theorem [I] □ 


The proofs of Lemma 4.3 and 4.2 follow by a simple calculation and are given in Section |4.3| 


Our key technical result is Proposition 4.1 Its proof is organized as follows. We analyze the 


expectation matrices E{iL 22 }; E{iLi 2 } in Section 4.5 We then control the random components 


H n — E{iLn} in Section 4.6, H \2 — E{iLi 2 } in Section 4.8, and H 22 — E{iL 22 } in Section 4.7 The 
application of the moment method to these deviations requires the definition of various specific 


graph primitives, which we isolate in Section 4.4 for easy reference. Finally, we combine the results 


to establish Proposition |4.1|in Section 4.9 


4.3 Proofs of Lemmas 14.31 and 14.21 


Proof of Lemma f.3. Recall that Woo is defined as: 


= asfi 1/2 + Cat + C -P^L + K/W + 2y^ + 3 as n) : 

n(a 2 p - oq) 


OL 1 


(92) 


Firstly, since p > c(/clog?r) 1//4 ?r 1//6 , and nn > logn, we have that p > n 4 / 12 asymptotically. 
Hence: 

riy/nazp 2 


a^n 


2 /— 
P yjn 

log n 


Similarly: 


Also: 


riy/nasp 2 nn 
y/na 2 2 


00. 


00. 


aqn 3 / 2 


< 


« 4 n 2 


n 3 a 2 p 4 /n(a 2 P ~ a 2 ) ~ (n 3 n 6 /?in 2 p 2 ) 

(a 3 n) 2 /ai) 


log" n log" n 

np A ~ n 7 / 8 

2 „ „ 2 , 


0 . 


n 3 a\p 2 ! n(a. 2 P — ot\) ~ p‘ 


< k log^ n k log^ n 


n 4 — 


0 . 


n 


a 3 n 


r? 4 / 2 K 3 /p 3 


< 




OLtpa 3 ! 2 K A n/p 3 nn 


0 . 


(93) 

(94) 

(95) 

(96) 

(97) 

(98) 
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Hence the term (riy/na 3 p 2 ) 2 /n(a. 2 P — «i) is dominant in Woo and the first claim of the lemma 
follows. 

For Hoi we have the equation: 

C — 

Hoi = a 3 h 1//2 + Catft’! 2 H- (a 3 n)(a 3 (n + -v/maq)) 

a i 

H- 7 ---7 ^(ny/na 3 p 2 + 2y/na 2 +a 3 n)(a 3 n). (99) 

n(a 2 p - af) 


It suffices to check that C'aqh 3 / 2 is the dominant term. By the argument in Woo we already have 
that the first term is negligible. Further since, a. 3 n/ yfna .2 = Ky/n log n/p 2 = (ftlogn ) 1 / 2 ?! 1 / 6 —> 0, 
to prove that the third term is negligible, it suffices that 


( a 3 n)(y/na 2 ) 


n 5 p 3 


n 5 p 6 \/Iogn 


p 


aqaqn 3 / 2 

By the estimates in IFoo the fourth term is negligible if: 

(riy/na 3 p 2 )(a 3 n) 


y/Togn 


0 . 


0 


i.e. 


0 . 


n(a2P — a\)ain 3 / 2 
n 5 / 2 log np~ 4 k 6 p 2 

n 5 / 2 logn 3 / 2 p _ 6 K 6 y/\ogn 

This implies the claim for Hoi- The calculation for B 02 and W 12 is similar. 
We now consider W\ 1 given by: 

Wn = a 3 n 1/2 + Cuaqn 3 / 2 


( 100 ) 

( 101 ) 

( 102 ) 


+ 


c 


(Ca 3 n + yfnoL 3 p 2 + 2a 2 ) 2 4- 

' ' nrt l -n _ 


, 2 X- ( 103 ) 

cci n{a2P — af) 

As in Woo, the first term is negligible. For the third term, first we note that a 3 n/a .2 = (k log n)n/p 2 > 
log 2 n —> 00 . Hence to prove that the third term is negligible, it suffices that: 

(a 3 h ) 2 /=■ x n 

-< kVu ->• 0 . 

The hnal term in Wn is negligible by the same argument, since n{a. 2 P — a 2 ) = mi 2 > ai. 

W 22 is given by: 

W22 = a 3 n 1 ^ 2 + Ca^n 

C(a 3 n) 2 


(104) 


+ C(a 3 n) 2 + 

ai n(a 2 P — a 2 ) 


(105) 


Since n(a 2 P — a 2 ) = nn 2 > ai it is easy to see that the third term dominates the fourth above. To 
see that the first dominates the second, it suffices that their ratio diverge i.e. 

(106) 


a 3 n 1 / 2 


p u 


Ol/pT, 


Ky/n 


> 


p 


k log n\Jn 
= c 3 (k log n) 1 ^ 8 n 1,/4 


00, 


18 


(107) 

(108) 



as k > 1/n. Thus we have that the first and third terms dominate the contribution for W 22 ■ This 
completes the proof of the lemma. □ 


Proof of Lemma \4-ty W 00 is given by: 

— (n — 2)(n — 3) 4 n(n — 1) 2 

bkoo = «2 + 2 (n - 2 )a 3 p H--- p ---o< 2 - 

It is straightforward to check that the third and fourth terms dominates the sum above i.e.: 

IToo 


(n-: 2 )(ra— 3 ) 4 _ ra(ra-l) 2 

2 Ct 4 p 2 ^2 


1 . 


(109) 


( 110 ) 


Further we have: 


(n — 2)(n — 3) 4 n(n — 1) 2 c .2 u 2 k 4 

-- 77 --Q4P - 0 0=2 = (b T 5n) - 2 5 

2 2 


(in) 


for some 5 n —>• 0. The claim for Woo then follows. 

The claims for W n and W 22 follow in the same fashion as above where we instead use the 
following, adjusting 5 n appropriately: 


W u 

na^p 

W 22 

Oi2 


0 L 2 + (n — 4 )« 3 p — (n — 3 )« 4 p 4 
na^p 

012 ~ ^oizp + a 4 p 4 
Q ?2 


( 112 ) 

(113) 

□ 


4.4 Graph definitions and moment method 

In this section we define some family of graphs that will be useful in the moment calculations of 
Sections 4.6, 4.7| and 4.8. We then state and prove a moment method lemma, that will be our basic 
tool for controlling the norm of random matrices. 


Definition 4.4. A cycle of length m is a graph D = (V,E) with vertices V = {v\,.. .v m } and 
edges E = {{uj, u*+i} : i G [m]} where addition is taken modulo m. 

Definition 4.5. A couple is an ordered pair of vertices (u,v) where we refer to the first vertex in 
the couple as the head and the second as the tail. 


Definition 4.6. A bridge of length 2m is a graph B = ( V,E ) with vertex set V = {ui,Vi,Wi : 
i G [m]}, and edges E = {{ui, Vi}, {ui, Wi}, {iti+i, Vi}, {uj+i, Wi} : i G [m]} where addition above is 
modulo m. We regard ( Vi,Wi) fori G [m] as couples in the bridge. 

Definition 4.7. A ribbon of length m. is a graph R = (V. E) with vertex set V = {u±.. .u m , 
vi...v m } and edge set E = {{rtj, rtj+i}, {ui, Vi + \}, {vi, r^+i}, {vi, Uj + i} : i G [m]} where addition 
is modulo m. Further we call the subgraph induced by the 4-tuple (ui, Vi, Uj+u Uj+i) a face of the 
ribbon and we call the ordered pairs (ui,Vi), i G [m] couples of ribbon. 


19 



















Each face of the ribbon has 4 edges, hence there are (^) ways to remove 4 — ?/ edges from the 
face. We define a ribbons of class r /, type v and length 2m as follows. 

Definition 4.8. For 1 < ij < 4 and 1 < v < (^), we define a ribbon of length 2m, class p and 
type v to be the graph obtained from a ribbon of length 2m by keeping edges in each face of the 
ribbon, so that the following happens. The subgraphs induced by the tuples (u 2 i-i,V 2 i~i,U 2 i,V 2 i) 
and (u 2 i+i,V 2 i+i,U 2 i,V 2 i) for i > 1 are faces of class rj and type n as shown in Ta 6 /e[I} 

For brevity, we write (g, u)-ribbon to denote a ribbon of class r/ and type v. 

Definition 4.9. A (g, v)-star ribbon S = ( V., E) of length 2m is a graph formed from a (g, v)-ribbon 
R(V',E') of length 2m by the following process. For each face (ui,Vi,Ui + \,Vi + i) we identify either 
the vertex pair (it*, Uj+i) or the pair ( Vi , fi+i) and delete the self loop formed, if any, from the edge 
set. Note here that the choice of the pair identified can differ across faces of R. 

We let S™ u denote this collection of (g,n)-star ribbons. 

Definition 4.10. A labeled graph is a pair (F = (V,E),£) where F is a graph and £ : V —> [n] 
maps the vertices of the graph to labels in [n ]. We define a valid labeling to be one that satisfies 
the following conditions: 

1. Every couple of vertices (u,v) in the graph satisfies l(u) < £(v). 

2. For every edge e = {vi,V 2 } G E, £{v\) £(v 2 )- 

A labeling of F is called contributing if, in addition to being valid, the following happens. For every 
edge e = {u,v} £ E, there exists an edge e 7 = {u',v'} e such that {£(u),£(v)} = {£(u'), £(v')}. 
In other words, a labeling is contributing if it is valid and has the property that every labeled edge 
occurs at least twice in F. 

Remark 4.11. Suppose F is one of the graphs defined above and C is a face of F. We write, 
with slight abuse of notation, CCFto denote “a face C of the graph F”. Furthermore, to lighten 
notation, we will often write e £ F for an edge e in the graph F. 

Definition 4.12. Let 2(F) denote the set of valid labelings of a graph F = (V, E) and £ 2 (F) 
denote the set of contributing labelings. Further, we define 

v*(F) = max ranged) (114) 

eez 2 (F) 

where range(^) = {i £ [n] : i = £(u),u is a vertex in F}. 

The following is a simple and general moment method lemma. 

Lemma 4.13. Given a matrix X £ W n ' xn ’, suppose that there exist constants c\, 02 , 03 , 04,05 > 0 
satisfying 02 > 04 and for any integer r > 0: 

ETr {{X T X) r } < ( n ) (c 5 ) 2 r (c 1 r + c 2 ) C3r+C4 . (115) 

Voir + c 2 y 

Then, for every n large enough, with probability exceeding 1 — we have that 

||X|| 2 < 04 y / exp(cir)n Cl (logn) C3-Cl . (116) 
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Proof. By rescaling X we can assume that C 5 = 1. Since Tr{(X J X) 2r } = ](T(< 7 j(X)) 2r where <Ji(X) 
are the singular values of X ordered ct\{X) > cr 2 (X) ... <jn{X), we have that: 


\X\\ 2 2 r = ai{X) 2r < Tr{(X T X) r }. 

Then, by Markov inequality and the given assumption: 

P{||X|| 2 > t} < p|Tr{(X T X) 2r } > f 2r } 

< t~ 2r ETr j(X T X) 2r } 

<( U t Vcir + c 2 ) C3r+c h 
\cir + c 2 J 

Using (£) < ( ne/k) k we have: 

P{||X|| 2 >t}< t~ 2r (ne) cir+C2 (cir + C2 )( c 3-ci)r+c 4 -c2 

= exp{(cir + c 2 )(logn + 1) + ((c 3 - ci)r + c 4 - c 2 ) log(cir + c 2 ) - 2r logt} . (122) 

Setting r = |"(logn — c 2 )/ci] and using c 2 > c 4 we obtain the bound: 

P{||X|| 2 > t} < exp |logn(logn + 1) + (c 3 /ci - l)(logn) log log n - (log n - c 2 )log(t 2 / C1 )} (123) 

< exp |log n log ^ne(logn) C3 / Cl_1 j — (logn — c 2 )log(t 2 / Cl )}. (124) 

We can now set t = {exp(r)n(logn) C3,/ci_1 } Cl//- whereupon the bound on the right hand side is at 
most ?rU I-C2 )/ 2 for every n large enough. This yields the claim of the lemma. □ 


(117) 


(118) 

(119) 

( 120 ) 

( 121 ) 


The next lemma specialized the previous one to the type of random matrices we will be interested 


m. 


Lemma 4.14. For a matrix X e M m xn , suppose there exists a sequence of graphs Gx{r) with 
vertex, edge sets V r , E r respectively, a set £(Gx(r)) of labelings £ : V r —> [n] and a constant j3 > 0 
such that: 


Tr {(X T X) r }=p 2r J2 II 9(.(e) 1 

ee2(G x (r))eeG x (r) 


(125) 


where, for e = {u,v}, t(e) = {£(u),£(v)}. Let £ 2 (Gx(?’)) C £(Gj(r)) denote the subset of con¬ 
tributing labelings (i.e. the set of labelings l £ £(Gj(r)) such that every labeled edge in Gx(r ) is 
repeated at least twice). Further define v(r) and v*(r) by: 


v(r) = \V r \ , 
v*(r) = v*(G x (r)). 


Then 


ETr 


{(X-'xy)<p’\^(G x (r))\ 

f3 2r V *(r) v{r \ 


v*[r. 


(126) 

(127) 


(128) 

(129) 
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Proof. By rescaling X it suffices to show the case (5 = 1. Taking expectations on either side of 
Eq. (125) we have that: 



E 

ee£(G x (r)) 


n g( p) > • 

e£G x (r) 


(130) 


The variables g^ are centered and independent and bounded by 1. Hence the only terms that do 
not vanish in the summation above correspond to labelings t wherein every labeled edge occurs at 
least twice, i.e. precisely when l e £2 (Gx (r)). By the boundedness of <?£( e ), the contribution of 
each non-vanishing term is at most 1 , hence 


ETr{(X T Xr} < |£ 2 (G Y (r))|. (131) 

It now remains to prove that |£ 2 (Gx(f)l < { v ™ r )) v ( r ) v r - By dehnition, i can map the vertices 
in V r to at most v*(r) distinct labels. There are at most ( v J/ r )) distinct ways to pick these labels 
in [n], and at most v*(r) v ( r ) ways to assign the v*(r) labels to v(r) vertices, yielding the required 
bound. □ 


Lemma 4.15. Consider the setting of Lemma f-14- If we additionally have 


v*(r) < cir + c 2 
v(r) = c 3 r + c 4 , 


(132) 

(133) 


where c 3 < 2ci then ||X || 2 < /3n Cl / 2 with probability at least 1 — n 5 . 
Proof. The proof follows by combining Lemmas |4.14 and 4.13 


□ 


4.5 The expected values E{7/ 2 2}> E{i7 12 } 

In this section we characterize the eigenstructure of the expectations E{17 22 }, E{i4i 2 }. These can be 
viewed as linear operators on M^ 1 ) that are invariant under the action of permutation^] c 


on 


By Schur’s Lemma [ Ser77 ], their eigenspace decomposition corresponds to the decomposition Ml 2 

in irreducible representations of the group of permutations. This is given by ’ 
where 


'W 


2 > = ¥ 0 ©Vi©¥ 2 , 


¥0 = {v G 
¥1 = {u E 


: 3u G 
: 3u G 


•t - v {i,j} = u f° r * ¥ 3 } 

s.t.(l n , u) = 0 and v u,j} = v-i + Uj for all i < j} 


¥ 2 = (¥ 0 ffi¥i) ± . 


(134) 

(135) 

(136) 


1 A permutation a : [n] —> [n] acts on R^ 2 ') by permuting the indices in (^) in the obvious way, namely 

= (cr(i), cr(j)}. 
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An alternative approach to defining the spaces Y a is to let Vo = span(uo),Vi = span(wj,z = 
1 ... n), V 2 = spanfu^, 1 < i < j < n), where 


K)a = 


(v\)a = 


K)a = 



if A = {i, ■} 


1 

S' 

1 

S' 

-2) 

/ n— 3 

V n—1 

II 

Vo. 

1 /n- 3 

if A = {i, •} 1 

n—2 Y n—1 

1 /n- 3 

( n ~ 2 ) V n ~ 1 

otherwise. 


(137) 

(138) 


(139) 


Notice that dim(Vo) = 1, dim(Vi) = n — 1, dim(V 2 ) = n(n — 3)/2, and that {ujliefn]! J }ije[n] 
are overcomplete sets. For o £ {0,1, 2}, we denote by 14 the matrix whose rows are given by this 
overcomplete basis of V a 

It is straightforward to check that the two definitions of the orthogonal decomposition r( 2 ) = 
Vo © Vi © V 2 given above coincide. We let 74 £ r( 2 ) x ( 2 ) denote the orthogonal projector on the 
space Y a . 

The following proposition gives the eigenstructure of 


Proposition 4.16. The matrix E{i^ 22 } has the following spectral decomposition 

E{7722} = Ao74 + Ai74 + A74 , 

where 


Ao = «2 + 2(n - 2 )a 3 p + 


(n — 2)(n — 3) 


-a 4 p 


n(n - 1) 2 


a; 


2 > 


Ai = «2 + (n — 4)a3p — (n — 3)a 4 p 4 , 

A 2 = «2 - 2 a 3 p + a 4 p 4 . 


(140) 

(141) 

(142) 

(143) 


Proof. It is straightforward to verify that the vectors vf defined above are eigenvectors of Eji^}- 
The eigenvalues are then given by A^ = (vf,K{Ho 2 }vf) for an arbitrary choice of A = {i} or 

' □ 

Remark 4.17. The above eigenvalues can also be computed using |MW13b| which relies on the 
theory of association schemes. We preferred to present a direct and self-contained derivation. 


We now have a similar proposition for K{H\ 2 } £ r(^iO x (^). More precisely, we decompose 
in span(l m ) and its orthogonal complement, and r( 2 ) = Vq © Vi © V2 as above. 


(N) 
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Proposition 4.18. The following hold for all n large enough: 

Q^E{H 12 }V 0 = 0 
Q^E{H 12 }Vi 2 < y/na 2 
Q^E{H 12 }V 2 = 0 

\\QnE{Hi 2 }'Po\\ 2 < n 3/2 a 3 p 2 + 2^/na 2 
Q n E{Hi 2 }Vi = 0 
Q n E{Hi 2 }V 2 = 0. 

Proof. For A G ( [ ” ] ) and B G ( [ J): 


(E{H 12 }) a , b = 


a 3 p 2 — a\a 2 if | A n B\ = 0 
a 2 — ai 02 if | A n B\ = 1 . 


Recall from the definition of the space Vi = spandu^j^unu)- We can write E{H\ 2 } as: 

wtu i ( n f l )(a 3 p 2 -a ia2 ) + {n- l){a 2 - a 3 p 2 ) T /(n - l)(n-2) 2 

E{H i 2 | = —^-.-- Iji^o + \ -(«2 - OL 3 p )V\. 


This implies all but the second and the fourth claims immediately as V{Pq = V\P 2 = 0, Q n V\ = 0 
and Qnl n = 0. For the second claim, the above decomposition yields: 


QnE{Hi 2 }V\ = max 

2 ^ GVi :|| x || 2 <1 


(n — 1 )(n — 2) 


(a 2 - a 3 p 2 )\\x 


= \j [U ^ 2) -(«2 - a 3 p 2 )^\ m UViV?). 

Since = —l/(n — 1) when A A' and 1 otherwise, we have that: 

FlF, T = In - -2— l„(l n ) T , 

n — 1 n — 1 

hence A max (FiV, T ) = n/(n — 1). This implies that: 

Q^E{Hi 2 }Vi = \Jn — 2(a 2 - « 3 p 2 ) < y/na 2 . 

For the fourth claim, the expression for E{Hi- 2 } above yields that: 

||S„E{ff 12 }Poll 2 = 


< ( n 2 X ) a 3P 2 + (n - l)a 2 

/ n —1 /n—1 

V 2 V 2 

< n^/na 3 p 2 + 2\fna 2 . 
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4.6 Controlling Hu — K{H U } 

The block Hu is a linear combination of the identity and the adjacency matrix of G. Hence, its 
spectral properties are well understood, since the seminal work of Fiiredi-Komlos (FK8 1], While 
the nest proposition could be proved using these results, we present an self-contained proof for 
pedagogical reasons, as the same argument will be repeated several times later for more complex 
examples. 


Proposition 4.19. Suppose that a satisfies: 


«i -1/2 

-a 2 p> a 2 n ' , 

o 

a 2 p — aq > 0 , aq > 0. 

Then with probability at least 1 — rj -5 : 


Proof. First, note that: 


Hu y 0 . 


Hu 1 A 


Qn H- 


n(a\p — oil) a i 


(159) 

(160) 


(161) 

(162) 


E{H U } = («i - a 2 p)l n + (a 2 p - «i) nQ n . (163) 

Furthermore, for A, B € (^), AfiB, (Hu ~ IE{Hii})a b = oi 2 gAB■ Here, we identify elements of 
(^) with elements of [n] in the natural way. Thus, expanding Tr {((F/n — EF/n ) T (Hu — EHn))” 1 } 
we obtain: 

m 

Tr { ((if n - E{H u }) J (Hu ~ E{Ff n })) m } = a 2 2 m ^ g Ae A’9A e+1 A' t , (164) 

A\...A rn ,A' 1 ...A' m i=l 


where we set An+i = A\. Let D(m) be a cycle of length 2 m, Vd,Eb> be its vertex and edge 
sets respectively, and l be a labeling that assigns to the vertices labels A\, A\. A 2 ,A' 2 ... A m , A! m in 
order. Then the summation over indices A\... A m , A\ ... A' m can be expressed as a sum over such 
labelings of the cycle D{m ), i.e.: 


Tr (H n ~ E{Hn}) 1 (H n - E{H ri 


_ 2m 

= a 2 


E 


n 


9i(u)t(y ) • 


(165) 


t££(D) e={u,v}GE D 


Let £ 2 (H(m)) denote the set of contributing labelings of D(m). By Lemma 4.15, it suffices to 


show that max£ g £ 2 (£)( m )) |range(£)| < m + 1. Since for a contributing labeling t of D(m), every 
edge must occur at least twice, there are at most m unique labelings of the edges of D(m). If we 
consider the graph obtained from (D,£) by identifying in D the vertices with the same label, we 
obtain a connected graph with at most m edges, hence at most m + 1 unique vertices. This implies 
that there are at most m + 1 unique labels in the range of a contributing labeling £. Hence with 
probability at least 1 — n -5 : 


|Hn-E{Hn}|| 


2 < 


a 2 n l Z 2 , 


(166) 
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Hence with the same probability: 


#11 >r («i - ot 2 p - Ca 2 n x ' 2 )\ n + ( a 2 p - a 2 ) n Q n , (167) 

for some constant C. Under the condition oq/2 — a- 2 p > a 2 h 1 / 2 (with a sufficiently large constant 
which we suppress) we have that: 


Hn y Y l n + (oi 2 p - af) n Q n , (168) 

or, equivalently, 

#11 h y2n + {at. 2 p - a\) n Q n . (169) 

Inverting this inequality yields the claim for #{~ 1 1 . This completes the proof of the proposition. □ 


4.7 Controlling H 22 — E{H 22 } 

The following proposition is the key result of this subsection. 


Proposition 4.20. With probability at least 1 — 25 n 5 the following hold: 


For a € {0,1} 

||P a (# 22 - E{# 22 })P a || 2 < a 3 h 1/2 + a 4 n 3/2 , 

(170) 


||P 2 (# 22 - E{# 22 })7> 2 || 2 < a 3 h 1/2 + a 4 h, 

(171) 

For a / b 6 {0,1, 2} 

|| P a (# 22 - E {# 22 }) n || 2 < a 3 n 1/2 + a 4 n 3/2 . 

(172) 


Recall that: 


(■ H 22 )a,b 


o 

-0-2 + a 2 

~ a 2 + a 3 (j> + 9t(A)t.(B)) 

~ q 2 + a 3 (p + 9h(A)t(B )) 

—a| + as(p + 9t(A)h{B)) 

-a i + a 3(p + 9h(A)h{B )) 

s —«2 + «4(p + 9h(A)h(B))(P + 9h{A)t(B)){P + 9t(A)h(B)){P + 9t{A)t(B)) 


if A = B 

if h(A) = h(B),A + B 
if t{A) = h{B),A + B 
if h(A) = t(B),A + B 
if t{A) = t(B),A / B 

if |An#| = o. 

(173) 


When \A n B\ =0 (last case above) we can expand #a,b as a sum of sixteen terms: 


Ha,B = a^(p + 9h(A)h(B)){P + 9h(A)t(B))(P + 9t(A)h(B)){P + 9t(A)t(B)) ~ a 2 (174) 

= {a^p 1 — a 2 ) + a4P 3 (5h(A)/i(B) + 9h(A)t(B) + 9t(A)h(B) + 9t{A)t(B )) 

+ «4P 2 {9h{A)h(B)9h{A)t(B) + 9h(A)h(B)9t(A)h(B) + 9h(A)h(B)9t{A)t(B) 

+ 9h(A)t(B)9t(A)h(B) + 9h(A)t(B)9t(A)t(B) + 9t.(A)h{B)9t(A)t(B)) 

+ <A4p{9h(A)h(B)9h(A)t(B)9t(A)h(B) + 9h(A)h(B)9h(A)t(B)9t(A)t{B) 

+ 9h{A)h{B)9t(A)h{B)9t{A)t(B) + 9h(A)t(B)9t(A)h(B)9t(A)t(B)) 

+ a 4 9h(A)h(B)9h(A)t(B)9t(A)h(B)9t(A)t(B)- (175) 
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Compactly, we can represent the above summation as follows. Each term above is indexed by 
a pair ( 77 , v) where 0 < 77 < 4 denotes the number of variables <? v occurring in the product, and 
v < ( ) determines exactly which 77 -tuple of g variables occur. For instance, when 77 = 1, we have 

(f) terms a 4 p 3 g h (A)h(B), a 4 p 3 9 h(A)t(B), a 4 p 3 g t (A) h (B), a 4 p 3 9t(A)t(B) ■ Equivalently, if R a ,b(v,v) is a 
labeled ( 77 , z^-ribbon with exactly one face and vertices labeled h(A),t(A), h(B),t(B ) in order, each 
term corresponds to one specific class and type of ribbon, i.e. 


Ha,b = ^2 ° 4 P 4 v 9ij- 


The exact mapping of the pair ( 77 , v) to the choice of edges in R a ,b{V, v) is given in Table [I] With 
a slight abuse of terminology, we refer to 77 as the class and v the type of the term. We define the 
matrices J V)U (for 77 = 1, 2, 3,4 and v = ( )) and K as follows. 


( Jrj,v)A , 


B 


_ I a 4 P 71 U{i,j}eR A , B M 9i i if 1^- n B\ = 0 . 


otherwise. 


(176) 


K A ,b 


a 3 9t(A)t{B) 
<X39h(A)t(B) 

< a 39t(A)h(B) 
®39h(A)h(B) 

0 


if h{A) = h(B),A / B , 
if t(A) = h(B),A / B , 
if h(A) = t{B),A^B, 
if t(A) = t(B),A / B , 
otherwise. 


(177) 


The matrices vanish on the set of entries A, B where A and B have non-zero intersection. 
This causes the failure of certain useful spectral properties with respect to the spaces Vo,Vi,V 2 . 
Consequently, for our proof, it is useful to define the matrices J r] . u that do not have this constraint. 


(Jr,,v)A,B = «4P 4 v 9ij- (178) 


Here we ignore the constraint that A, B do not intersect, and follow the convention that ga = 0 for 
every i E [n\. 

Thus, with Eq. (173) we arrive at the following expansion: 


4 0 

H 22 - E{H 22 } = K + 

r/=l u=l 

4 4 5 

= K + <72,1 + <72,6 T <74,1 + 'y ^ J3,U +E(^-^)+E( J v 

V=1 U=1 V=2 

4 5 

+ -h,u + < 7 V . 

v=l v—2 


(179) 


<72, u) 


(180) 


We now prove a sequence of lemmas regarding the spectral properties of the matrices K, J^ )V 
The first one concerns the case 77 = 2, v = 1, 6 and 77 = 4, v = 1. 
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Lemma 4.21. With probability at least 1 — 3 n 5 , we have that: 

||^ 2 ,i + t/2,6 + J44II2 ^ « 4 n 


( 181 ) 


Proof. By the triangle inequality: 

||-^ 2 , 1 + <^ 2,6 + «^ 4,11 | 2 ^ 11 ^ 2 , 11| 2 + 11 ^ 2,6 II2 + l|^ 4 ,l || 2 • ( 182 ) 

We prove that with probability at least 1 — n -5 

11 II 2 ~ ®477, (183) 

for ( 77 , v) = (2,1), (2, 6 ), (4,1). The claim then follows by a union bound. 

Let R(r/, 12 , rri) denote a ( 77 , 7 /)-ribbon of length 2m. Then, by expanding the product we have: 


Tr 


{(4 


M r } 


£ (“4 P A ~ v ) 2m 

£&£(R(r),u,m)) 


n 

eGR(r],u,m) 


(184) 


Here we write 1(e) in place of the pair £(u),£(v) when u, v are the end vertices of e. Since R(rj, v, m) 
has 4m + 2 vertices, by Lemma 4.15 it suffices to prove that range(£) = 2 m + 2. 

We first prove this for the case 77 = 2 and v = 1,6. Let £ be a contributing labeling of the 
ribbon R(r], v, m) of length 2m. Let G(? 7 , v) denote the graph obtained by identifying in R(r], v, m) 
every vertex with the same label according to £. We have: 


ff connected components in G(t 7 , v) < 
# edges in G^ < 


ff connected components in R(r], v, m) = 2 

ft edges in R(ri, u. m) 

-= 2m. 

2 


(185) 

(186) 


It follows that there are at most 2m + 2 unique vertices in G(? 7 , m) and hence, at most 2m + 2 
unique labels in range(T). 

We now prove the condition max^^^^^)) range(^) = 2m + 2 for 77 = 4, v = 1, induction on 
m. The base case is m = 1 (or a ribbon of length 2), wherein it is obvious that a contributing 
labeling £ can have at most 4 = 2m + 2 unique labels. Now, assume the claim is true for ribbons 
of length at most 2m > 1 and we will prove it for R( 4, l,m + 1) of length 2m + 2. Consider any 
contributing labeling £ of R( 4,1, m + 1). We now have the following cases 


1. For every vertex u G R( 4,1,m + 1), there exists v! / u such that £(v!) = l(u). 


2. There exists vertex u G R( 4,1, m + 1) with a unique label i = £(n) and the degree of u is 4. 

For case 1, if every label in the range of £ occurs at least twice in R( 4,1, m), the number of unique 
labels is bounded by 2(m + 1), since R( 4,1, m) has only 4(m + 1) vertices, hence the claim follows. 

For case 2, let (ui,t>i) and ( 772 ,^ 2 ) be the neighboring couples of u. If u is connected to 
all of u±,vi,U 2 ,V 2 , since the edges connected to u must occur twice, it must hold that £(u\) = 
£( 112 ) and £(v\) = £(v 2 ) (recall indeed that £(u\) < £(v{), £( 112 ) < £( 02 ) by definition of a valid 
labeling). Hence, we can contract the ribbon removing the couple containing u and all edges and 
identifying the couples ( 74 , tq) with ( 772 , 772 ). We obtain now a ribbon RA, 1 , m) of length 2m and 
an induced labeling £ thereof which is contributing. By induction hypothesis, range(£) < 2m + 2, 
hence range(£) = range(£) + 2 < 2(m + 1) + 2. This completes the proof. □ 
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Lemma 4.22. With probability at least 1 — 8n 5 , we have 

4 


J2 


< a^pn. 


( 187 ) 


Proof. By the triangle inequality, it suffices to show that for v E {1,..., 4}, with probability 

1 — 7J _ ( r ~ 2 ^/ 2 : 


7/3,1* 2 - 

(188) 

We prove the above for the case r* = 2. The other case follow from analogous arguments. Firstly, 
define the matrices 7 / 3 ^ G m( [ 2 1 ) x ( [ 2 I ) and Q E M ” 2 *” 2 as follows: 

( J 3 , 2 ){*, 1 }, {fei} — «4 P9ik9il9jh 

Q(i,j),(k,l) 9ik9il9jl ■ 

(189) 

(190) 

Note also that ^2 differs from J 3 ^ only in the entries {i,j},{fc,/} where 
(columns) of Q above are indexed by ordered pairs (i,j) G [n] x [n]. Now we 
V([ n \\ : M n " -E m( [ 2 ] ) by letting, for all i,j E [n], 

j = k. The rows 
define the projector 

(/^(N) (* C )){*,1} x {i,j) ' 

(191) 

Then we have <73,2 = oi^pV ([n]\QVj [n ^ and, consequently, 7 / 3,2 2 < ct^p \\Q 2 - Therefore it suffices 

v 2 1 V 2 )_ 

to bound the latter, which we do again by the moment method. Firstly we define: 

//(*,!),(fc,/) — "y ) 9iq9qk9ij^-{j — 0 > 

q£[n] 

D(i,j),(k,l) = y ] 9jq9ql9ijHi = k) . 

q£[n] 

(192) 

(193) 


Then we have, for any integer m > 1, 

Tr((Q T Q) m ) = E E ^(*ljl),(*2j2)^(*2j2),(i3j3)^(i3,j 3 ),(i4j4) ' ' ' J2m),(u ,jl) 

* 1, *2, • • • ,* 2m e [n] jl J2, • • • ,1m 6 [**] 

E H (5 ilildjljldiljz) ’ ( 512 * 3 9j2j39j2iz) ' ( 5 * 3 * 45 ^ 3 ^ 45 * 314 ) ’ ’ ’ (5*2m*l 9j2mjl9j2mil) 

*l,*2,...,*2me[n] JlJ2,-Jm6H 

E E (5*1*25*2*35*3*4 ' ' ' 5*2m*l ) {Sjlj29j2J39j3j4 ' ' ' 9j2mjl ) (5*ll2 5l2*35*3J4 ' ' ' 5j2m*l 

*1 ,*2 , • • • ,*2777 e [n] jl J2,• • • J2m e [n] 

H ^2 {9 *1*25*2*35*112) (5l2l35l3l45l2*3) (5*3*45*4*55*314) ' ' ' {.9j2mjl9hj29j2m.il) ' 

*1 ,* 2,*2777 € [**] li ,1 2 ,... ,12777 S [n] 

Then we have 


Tr((Q T Q) m ) = Tr((C/D) m ). 


( 194 ) 



Hence 


HQIh < Tr((Q T Q) m ) 1/2m < Tr {{UD) m fl 2m < (n 2 ||C/||^|| J D||^) 1/2m < n l / m \\U\\ 2 , (195) 

where in the last step we used the fact that \\U\\- 2 = ||-D|| 2 by symmetry. Since m can be taken 

arbitrarily large, we conclude that ||Q|| 2 < IIC/ll 2 and we proceed to bound the latter. 

2 2 

Now let T G M n xn be the element-wise multiplication by g, i.e. 

T (i,j),(k,l) = dijHi = k)I(j = l ) • (196) 

Then we have 

U = T ■ {g 2 0 I,,) (197) 

Here g G M nxn is the matrix with i,j entry being g t j. Since \gij\ < 1, we have ||T|| 2 < 1 and 
therefore 


Finally, similar to Proposition 
with the same probability: 


IIQH 2 < \\U\\ 2 < \\T\\ 2 \\g 2 <g> I|| 2 < |b 2 ® I|| 2 < |b 2 || 2 < || 5 || 2 . (198) 

we have that ||<?|| < h 1 / 2 with probability at least 1 — n -5 , hence 


4.19 


<^3,2 


< a A pn. 


(199) 


By triangle inequality |] *^ 3 , 2 1| 2 < || </ 3 , 2 || 2 + | J 3)2 — </ 3 , 2 1| 2 , hence to complete the proof we now bound 
||J 3 , 2 — ^ 3 , 2 1 |2 using the moment method. Recall that J 3i2 and J 3}2 differ in the entry {i, j}, {k, £} 
only if j = k. Hence: 


Tr | ((J 3 ,2 - T 3i2 ) t (J 3 , 2 - J3,2)) m } - ( 0 : 4 p) 2m II \ 9i q i q +i9i q j q +i9j q j q +i 

h—hm,jl—j 2 m,Vqiq<jq 9=1 V 

9i q +iiq+2Sj q +ii q +29j q +ijq+2^Ul = *2 = J3 = H = ‘ ‘ ‘ = *2m)^ 

( 200 ) 

= (a4p) 2m ^2 J^[ 9t{u)(.{v) • (201) 

£e£(i?(3,2,m) e={u,v}eR(3,2,m) 


Here, R( 3, 2, m) is a (3, 2)-ribbon of length 2m and £(R( 3, 2, m) is a collection of labelings of 
R{ 3, 2, m) satisfying the following criteria 

1. For every couple (u,v) G R(3,2,m), £(u) < £(v). 


2. Let (ui, ui), (u 2 , v 2 ) ■ ■ ■ (u 2m , V 2 m) denote the couples in R(3,2,m). Then £{v\) = £{u 2 ) = 
£(v 3 ) = £(u 4 ) .... 

Let £ 2 (R(3, 2, m)) denote the subset of contributing labelings, i.e. those that satisfy the addi¬ 
tional criterion that every labeled edge is repeated twice. By Lemma |4.15 it suffices to show that 
v*(R(3,2,m)) = max £ S £(^( 3 2 ,„)) |range(^) | <m + 2. We prove this by induction. For the base case 
of m = 1, since every edge is repeated twice under a contributing labeling, it is easy to see that 
there are at most 3 unique labels. Assume the induction hypothesis that v*R( 3, 2, m — 1) < m + 1 . 
Let £ be a contributing labeling of R( 3 , 2, m). Then one of the following must happen: 
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1. No vertex in R( 3, 2, m) has a unique label under £. 


2. There exists a vertex w of degree 4 with a unique label under £. 

The second condition follows because the vertices of degree smaller than 4 already have non-unique 
labels due to condition 2 of the labeling set £(i?(3,2, m)). 

In case 1, R( 3, 2, m) can have at most 2m/2 + 1 = m - 1-1 < m + 2 unique labels under l. In 
case 2, since w has a unique label and degree 4 the neighboring (u,v), ( u',v') have the same labels 
under i i.e. £{u) = £(u') and £{v) = £{y’). Hence we can identify the couples (u,v), (u',v'), delete 
w and its incident edges to obtain a ribbon R( 3, 2, m — 1) of length 2 m — 2 and an induced labeling 
£ thereof. By the ind uction hypothesis range(T) < m + 1 hence range(£) = range(£) + 1 < m + 2, as 

^ 3,2 — « 4 pn with probability at least 1 — ro -5 . 

ibability at least 1 — 2n 


required. By Lemma 


4.15 


we obtain that 


By Eq. (199), it follows that with pro 


-5 


II ^3,21 |2 < «4 pn < a±n. 


completes the proof of the lemma. 


This 

□ 


For the case rj = 1 we prove the following 

Lemma 4.23. Rec all th at V ■? : M^z 1 ) —y M^z 1 ) is the orthogonal projector onto the spaceY 2 C M^z 1 ) 
(defined in Section f.5). Firstly, we have that V2(Ylt == 0 Further, with probability at 
least 1 — 4n~ 5 , we have that: 




V=1 


< a 4 n 3/2 


( 202 ) 


Proof. Recall from the definition of that 


Now, for any »6Ro 


v){i,j},{k,Z} ~ P 3 {dik + 9iZ + 9jk + 9jt)- 


(203) 


i/=i 




\v=\ 


v v I p3 ( gik + Sie + g i k + g p) V {k,l} 

{i,j} k <Z 

= Ui +Uj, 

where we define tq = Yl,k<lP 3 (hh k + 9ii) v {k,i}- It follows that J2t=i ^ V^ = Vo © Vi, and 
hence V 2 J2t=i g v,v = Since Ylv=i Ji,v is symmetric we obtain the first claim. 


We prove the second claim -cf. Eq. (202)- by the moment method, similar to Lemma 4.21 


Let R( 1, v, m ) be a (1, z/)-ribbon of length 2m. Then: 


{(Ji.vJiwY} = (<MP 3 ) { II 9t{u)Z{v) 

Z£St{R(l,v,m)) ^ e={u,v}£R(l,u,m) 


(204) 


By Lemma 4.15 it suffices to prove that v*(R(l, v, m)) = 3m + 2. The claim then follows, using 
Lemma 14.151 and the union bound. 
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Let t E £ 2 (R(l,v,m)) be a contributing labeling of a ribbon .) of length 2m. Let 

G(l, v, m) be the graph obtained by identifying vertices in R{ 1, v, m ) with the same label. Notice 
that R{ 1, i/, m) is a union of a cycle D{m) of length 2m and 2m + 1 isolated vertices. The isolated 
vertices can have arbitrary labels, hence v*{R{ 1, m)) = 2m + 1 + v*(D(2m)) = 3m + 2 as proved 

in Proposition |4.19 □ 


Lemma 4.24. We have that: 


Further with probability at least 1 — 2 n 


norm of the terms J 2 , 2 , <42,3, <^ 2 , 4 , <42,5: 


(<42,2 + J2 a)V 2 = 0, 

(205) 

4M<4 2 ,3 + ^ 2 , 5 ) = 0. 

-4 

(206) 

<4 2 ,2 2 < (a 4 p 2 )h 3/2 , 

(207) 

<42,4 2 < (a 4 p 2 )h 3/2 . 

(208) 


Proof. It is easy to check that < 42,2 = Jj ,3 and ^ 2,4 = Jj, 5 - We prove Eq. (206), from which Eq. ( 205| ) 
follows by taking transposes of each side. From the definition of J 2 v we have for any v E R^ 2 ') 


(J2,3V + J-2,5v){i,j} = y^{9ik9U + 9jk9ji)v{k/} 

k<£ 

= Ui + Uj, 


(209) 

( 210 ) 


where we let Ui == ^2 k< £P 2 (gik9ie)v{k,i}- It follows that (J 2j 3V + J 2 ,5)w E ¥ 0 ©¥1 hence + 

■h.o) = 0 . _ _ 

We prove the claim on the spectral norm for < 42 , 2 - The claim for J 2 4 holds in an analogous 
fashion. Let R( 2, 2, m) be a (2, 2)-ribbon of length m. Then: 


Tr 


2 )"*} 


y (MAP 

€G£(H( 2 , 2 ,m)) 


2\2m 


n 


9t(u)£(v)■ 


( 211 ) 


e={w,t;}Gi?(2,2,m)) 


By Lemma 4.15 it suffices to show that v*(R(2,2, m)) = 3m + 2. i.e a contributing labeling t maps 


to at most 3m + 2 unique labels. Notice that i?(2,2,m) is the union of m + 1 isolated vertices 
and a bridge B(m ) of length 2m. The isolated vertices are unconstrained and hence contribute 
at most m + 1 new labels. It suffices, hence, to prove that B(m ) has at most 2m + 1 unique 
labels under its labeling lB(m) induced by L Since, iB( m ) is contributing for B(m), it suffices that 
v*(B(m)) = 2m + 1. We prove this by induction on m. In the base case of m = 1, this implies 
it has at most 3 = (2 • 1 + 1) unique labels. Assuming that the claim is true for bridges of length 
at most 2m for m > 1, we show that it holds for a bridge B(m + 1) of length 2m + 2. B(m + 1) 
contains 3m + 4 vertices hence there are 3 cases: 


1. For every vertex u E B there exists a different vertex v! E B such that Ib( u ) = £b(u'). 

2. There exists a vertex u E B which has a unique label under £b and u has degree 4. 
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3. There exists a vertex u E B which has a unique label under £b with degree 2. 


In the first case, |range(T)| < (3m + 4))/2 < 2(m + 1) + 1 hence the claim holds. 

In the second case, we have that the neighboring couples are ( 112 , 02 ) then £B( m +i)( u i ) = 

tB(m+i)( u 2 ) and £s(m+ i)(+l) = £B(m+i)( v 2 )- We can then contract the neighbors of u and delete 
u and incident edges to obtain a bridge B(m ) (and induced labeling ^B(m) of length 2m). By 
induction ^(m) ma P s to at most 2m + 1 labels, hence Is^m) to at most 2m + 1 + 1 < 2(m + 1) + 1 
labels. 

In the third case, if u has neighbors U\,U 2 then £s(m+ i)( u i) = £ B (m+ i)(^ 2 )- If we now identify 
the neighbors of u with the same label, and delete u and the edges incident on it, we obtain a bridge 
B(m) of length 2m, and an induced labeling £g, ^ which is contributing. By induction, B(m ) has 
at most 2m + 1 unique labels, hence B(m + 1) has at most 2m + 1 + 2 = 2(m +1) +1 unique labels. 
This completes the induction. □ 


Finally, we have to deal with the remainder terms (recall that matrix I\ is defined in Eq. (177)). 
Lemma 4.25. We have with probability at least 1 — n -5 that: 


\K\ 


< 

r^j 


a^n 


1/2 


( 212 ) 


Proof. We compute Tr {(K J K) m }. Note that: 


Tr {(K T K) m }= Y, II ( K A lBl K A l+lBl ) (213) 

1 = 1 
r 

E YI Ka iBi K a 1 +iBi K\Ai n B t \ = 1 )I(|A /+ 1 n B t \ = 1). (214) 

-^1 ?-^l 1 = 1 


Here we set A m+ \ = A\. The second equality follows since K is supported on entries A,B such 
that A, B share exactly one vertex. Recalling the definition of star ribbons, each term that does 
not vanish in the summation above corresponds a labeling of a star ribbon S( 2,1, m) E Sf\ formed 
from a (2, l)-ribbon of length 2m, i.e. we have: 

Tr{(K T K) m }=af n E E II &(«),<(«)■ ( 215 ) 

S( 2 , 1 , 771 ) 65 ^! £e£(S{2,l,m)) e={u,v}eS(2,l,m) 


Since there are at most 2 2m = 4 m star ribbons of length 2m, it suffices by a simple extension of 
Lemma 4.15, to show that v*(S( 2, l,m)) = m + 2. Note that every 5(2, l,m) is a union of 2 paths, 
one of length m! and the other of length 2m — m! for some m! E [2m], hence has at most 2 connected 
components. Let £ be a contributing labeling of 5(2, l,m) and G5( 2 ,i,m) b e graph obtained 
by identifying vertices in 5(2, l,m) with the same label. Since 5(2, l,m) is a union of two paths, 
Gg( 2 ,i,m) has at most 2 connected components. Furthermore, since £ is a contributing labeling, 
every labeled edge in 5(2,1, m) repeats at least twice, hence G 2 ,i(m) has at most 2m/2 = m edges. 
Consequently, it has at most m + 2 vertices, implying that u*(5(2,1, m)) < m + 2. □ 

Finally, we deal with the differences J ri . v — J r] .v (Recall that J VjU and J. n . v are defined in Eqs. 


(176) and (178). 
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Lemma 4.26. With probability at least 1 — 6 n 5 , for each rj < 2 and is < ( 4 ) ; 


Jrj,v dqj/ 


< a 4 n 


( 216 ) 


Proof. We first consider — J v ,u) T (Jr/,u — Let R{r],v, m) be a (r/, zz)-ribbon of 


length 2m. As in the previous lemmas, we can write Tr | (fJ V M — Jrf,u) T {Jri,u — <4;,i')) m } 
over labelings of R(r], is, m) as follows: 

4—r?\2m 


Tr 


{((■A 


77,1/ - <V) T (-V - Jr,*))” 1 } = («4 


I] II 9t{u),t(y) ■ 

i&£{R(ri,v,m)) e={u,v}eR(r),v,m ) 


as a sum 


(217) 


Here we restrict the labelings £ to the subset £.(R(rj, is, m) that satisfy the criteria: 

1. For every couple (u,v), £(u) < £(v). 

2. Consider any adjacent pair of couples (u\, v±), ( 112 , 02 ) in R(r], is, m), at least one of u\,v\, U 2 , V 2 
has degree 0. Assume this is u\ (without loss of generality), then either t(u\) = £( 0 , 2 ) or 
£(ui) = £(v 2 ). 

On taking expectations the only labelings that do not vanish satisfy the additional criterion that ev¬ 
ery labeled e dge is repeated at least twice in R(r], is, m). We call this set of labelings £2 (R(v, v, m)). 


As in Lemma 


4.24 


it suffices to show that 


£^(R(v,v,m)) < ( 2 ^+ 2 ) (2 2 m (2rn+2) 3m+2 ). This follows 


from the same arguments as in Lemmas 4.24, 4.23 (for 77 = 1, 2 respectively), with the additional 
caveat that the isolated vertices in R(r], is, m) are not unconstrained as before. Indeed, once the 
labels of the connected component of R(r], is, m) are decided, there are only 2 m possible ways of 
choosing the labels for the isolated vertices. Consequently, we have the bound: 


ETr 


{((■V - J V ,») T (J V ,» - Jn,u)) m } < (a 4 p 4_,? ) 2m | & 2 (R(r,, V ,m)) 


< 


n 


2 m + 2 


(2a 4 p 4 - ?? ) 2TO (2m + 2) 3m+2 . 


(218) 

(219) 


Applying Lemma 4.13, union bound and the triangle inequality yields the final result. 


□ 


We can now prove Proposition 4.20 


Proof of Proposition f.20, The intersection of high probability events of Lemmas 4.21, 4.22, 4.23 
4.24, 4.25 and 4.26 holds with probability at least 1 — 25n~ 5 . We will condition on this event for 


the proof of the proposition. 

We bound each of the projections V a (H 22 — E {H. 22 })'Pb for a, b E {0,1, 2} using the decompo¬ 
sition (180). 


Let us first consider a = b, a,b G {0,1}, cf. Eq. (170). By application of above lemmas, 

triangle inequality, the fact that ||T’ a AT7 || 2 < 11'F > a ,11 2 ||^|| 2 ||7 7 b || 2 < ||X || 2 for any X 6 M^z 1 ) 
in the decomposition Eq. (180), we get 


II Va(H 2 2 ~ E{H 2 2})V a \\ 2 < a 3 n^ 2 + a 4 +a 4 h 3 / 2 ) 


< a 3 n 1/2 + a 4 n 3/2 , 


( 220 ) 

( 221 ) 


This proves Eq. (170). 
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The case a = b = 2 is treated in the same manner, with the only difference that, when 
bounding \\V 2 (H 22 — E{H 22 })V 2 \\, the terms of the type a 4 h 3//2 do not appear (see Lemmas 


4.23 4.24). Hence: 


\\V 2 (H 22 - E{H 22 })V 2 1 | 2 < a 3 n 1 ' 2 + o 4 h 

< 0:3h 1 / 2 + a 4 h. 


( 222 ) 

(223) 


This proves Eq. (171). 

The bound for the cross terms \\V a {H 22 — E{H 22 ])Vb 1 | 2 for a ^ b is identical to that for the 
case a = b = 0 above. 


This proves Eq. (172) and hence finishes our proof of Proposition 4.20 


m 


4.8 Controlling H 42 — E{L/i 2 } 

We prove the following proposition for the deviation H\ 2 — E{H\ 2 } 
Proposition 4.27. With probability at least 1 — 5n - ' 5 the following are true. 


\\H 12 -E{H 12 }\\ 2 <a 3 n. 

Recall that an entry of H\ 2 E rCi') x (^) can be written 


as: 


{H i 2 ) a,b = 


a 2 — a\a 2 if \A n B\ = 1 

«3 (j> + 9A,h(B))ip + 9A,t(B )) - « 1«2 otherwise. 


Define the matrices L, hU E 


(L>2,i)a,b = 
{Li,i)a,b = 


It thus follows that: 

H\2 — E {H\ 2 } = Liq + Li i2 + L 22 . 

We first prove two Lemmas on the spectral properties of the matrices L V>1/ 
Lemma 4.28. With probability at least 1 — n - ' 5 , we have that 

11-^2,i|| 2 ^ Oi 3 n. 


(224) 


(225) 


for 77 = 1 , 2 , 

v < ( 0 ) and L\^ v for v = 1 , 2 : 


f a 39A,h(B)9A,t(B) if \AnB\— 0 

(226) 

1 ° 

otherwise. 

| «3 P9A,h{B) 

if dnfi = 0 

(227) 

1 ° 

otherwise. 

| «3 P9A,t{B) 

if \An B\ = 0 

(228) 

0 

otherwise. 


(229) 


(230) 
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Proof. Note that: 


Tr 


r 

{(iyil,)} e n 9A l h(B l )9A l t(B l )9Ai +x h(B l )9A l+1 t(B l ) 

Ai...A m +i,Bi...B m l —1 


(231) 


Equivalently, letting B(m) be a bridge of length 2m we have: 

Tr{(L 2>1 Lj ;1 r}= Y, II 9 tW{vy 

£(z£(B) e={u,v}GB 


(232) 


By Lemma 4.15 it suffices to show that v*(B(m)) < 2m + 1. This argument is already covered in 
Lemma 14.241 and the claim hence follows. □ 

Lemma 4.29. With probability exceeding 1 — 2 n~ 5 the following holds: 


max || L 

z/=l,2 


1 , u\ 


< 

2 ~ 


03 ?!. 


(233) 


Proof. We prove the claim for L\^. The same argument applies for Lq 2 with minor modifications. 


Tr 


m 

|(Li i2 L7 )2 ) m | = Y ( a 3P) 2rn '[{gA l h(Be)9A l+1 h(B l )- 

Ai...A m +i,Bi...B m l —1 


(234) 


The above a sum over labelings of a bridge B(m) of type 1 and class 1, of length 2m. This is union 
of a cycle D{m ) of length 2m, and m isolated vertices. The lemma follows from Lemma 4.15 if 
v*(B(m)) < 2m + 1. But by the above decomposition v*(B(m)) < v*(D(m)) + m = m+ l + m = 
2m + 1, as in Proposition 4.19. This completes the proof. □ 

We can now prove Proposition 4.27| 

Proof of Proposition f.21. The intersection of favorable events of lemmas 4.28| 4.29 probability 
at least 1 — 5The required claim then follows from Lemmas 4.28 4.29 and triangle 
inequality. □ 


4.9 Proof of Proposition |4.1| 


The intersection of high probability favorable events of Propositions 4.19, 4.20 and 4.27 holds with 
probability at least 1 — 30n~ 5 > 1 — n -4 for large enough n. By Proposition 4.19 we already have 


the required bounds on Pfu and , cf. Eqs. (55) and (56). It remains to show that on the same 
event: 


H 22 el — HyzQnH\2 + 


1 


ol\ 


n{a.2P — ccf 




or, equivalently, 


Or E {H22} h E {H22} - H22 + —Hj 2 Q n H V2 + —--- ^ 

ai n(a 2 p — a\) 


^l T 2 Q^ 12 . 


(235) 


(236) 
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Let W, W E M 3x3 be two matrices that satisfy, for a, b E {0,1, 2}: 

W ab = \\V a E{H 22 }V b \\ 2 


W ab > \\V a (H 22 - E{H 22 })V b \\ 2 + — 

a i 

1 


QnH 12Va 


QnH l 2 V b 


+ 


n(a 2 p - af) 


\QnHl 2 V a \\ 2 \\QnH 12 V b \\ 


2 ' 


(237) 


(238) 


By expanding the Rayleigh quotient of each term in Eq. (236), and noting that W a b = 0 for a 7= b, 
it is straightforward to see that Eq. (236) holds if 


a 2 p — a\ > 0 , 

W y W. 


(239) 

(240) 


The first condition correspond to assumption (|53|). For the second one, we develop explicit expres- 

that yields immediately W a b = 0 for 


sions of W, W as follows. For W, we use Proposition 


4.16 


a / b as claimed, and W o,o, bEi,i, W 2)2 as in Eqs. (43), (44), (45). 

In order to develop expressions for W we note that it is sufficient to guarantee 


W ab >\\Va(H 22 -K{H 22 })V b \\ 2 

E{i7 12 }P Q || 2 + II H 12 - E{i7 12 }|| 2 ) (\\Q^E{H 12 }V b \\ 2 + \\H 12 - E{H 12 }\\. 


2 

H- 

Qq 


+ 


n(a 2 p - a\) 


\QnE{H 12 }V a \\ 2 + \\H 12 - E{i7 12 }|| 2 ) (\\Q n E{H 12 }V b \\ 2 + \\H 12 - E{H 12 }\\ 2 ). 

(241) 


Using the upper bounds in Propositions 4.18, 4.20, 4.27 we obtain the expressions in Eqs. (46) to 
(51). This completes the proof. 


References 

[AKS98] Noga Alon, Michael Krivelevich, and Benny Sudakov, Finding a large hidden clique in 
a random graph, Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
algorithms, Society for Industrial and Applied Mathematics, 1998, pp. 594-598. 

[AV11] Brendan P.W. Ames and Stephen A. Vavasis, Nuclear norm minimization for the planted 
clique and biclique problems, Mathematical programming 129 (2011), no. 1, 69-89. 

[Barl4] Boaz Barak, Sums of Squares upper bounds, lower bounds, and open questions (Lecture 
notes, Fall 2014), http://www.boazbarak.org/sos/, 2014. 

[BR13] Quentin Berthet and Philippe Rigollet, Complexity theoretic lower bounds for sparse 
principal component detection, Conference on Learning Theory, 2013, pp. 1046-1066. 

[BS14] Boaz Barak and David Steurer, Sum-of-squares proofs and the quest toward optimal 
algorithms, arXiv:1404.5236 (2014). 


37 










































[CLR15] 

[CX14] 

[DGGP11] 

[DM14] 

[FGR+12] 

[FK81] 

[FKOO] 

[FRIO] 

[GM75] 

[Has96] 

[HWX14] 

[Jer92] 

[JL09] 

[Kar72] 

[KhoOl] 

[LasOl] 


T Tony Cai, Tengyuan Liang, and Alexander Rakhlin, Computational and statistical 
boundaries for submatrix localization in a large noisy matrix , arXiv:1502.01988 (2015). 

Yudong Chen and Jiaming Xu, Statistical-computational tradeoffs in planted prob¬ 
lems and submatrix localization with a growing number of clusters and submatrices, 

arXiv:1402.1267 (2014). 

Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres, Finding hidden cliques in linear time 
with high probability ., ANALCO, SIAM, 2011, pp. 67-75. 

Yash Deshpande and Andrea Montanari, Finding hidden cliques of size y/N/e in nearly 
linear time, Foundations of Computational Mathematics (2014), 1-60. 

Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao, Sta¬ 
tistical algorithms and a lower bound for planted clique, arXiv:1201.1214 (2012). 

Zoltan Fiiredi and Janos Kornlos, The eigenvalues of random symmetric matrices, Com- 
binatorica 1 (1981), no. 3, 233-241. 

Uriel Feige and Robert Krauthgamer, Finding and certifying a large hidden clique in a 
semirandom graph , Random Structures and Algorithms 16 (2000), no. 2, 195-208. 

Uriel Feige and Dorit Ron, Finding hidden cliques in linear time, DMTCS Proceedings 
(2010), no. 01, 189-204. 

Geoffrey R Grimmett and Colin JH McDiarmid, On colouring random graphs, Mathe¬ 
matical Proceedings of the Cambridge Philosophical Society, vol. 77, Cambridge Univ 
Press, 1975, pp. 313-324. 

Johan Hastad, Clique is hard to approximate within ro 1-e , Foundations of Computer 
Science, 1996. Proceedings., 37th Annual Symposium on, IEEE, 1996, pp. 627-636. 

Bruce Hajek, Yihong Wu, and Jiaming Xu, Computational lower bounds for community 
detection on random graphs, arXiv preprint arXiv: 1406.6625 (2014). 

Mark Jerrum, Large cliques elude the Metropolis process, Random Structures & Algo¬ 
rithms 3 (1992), no. 4, 347-359. 

Iain M Johnstone and Arthur Yu Lu, On consistency and sparsity for principal compo¬ 
nents analysis in high dimensions, Journal of the American Statistical Association 104 
(2009), no. 486. 

Richard M. Karp, Reducibility among combinatorial problems, Complexity of Computer 
Computations (R. E. Miller and J. W. Thatcher, eds.), Plenum, 1972. 

Subhash Khot, Improved inapproximability results for maxclique, chromatic number and 
approximate graph coloring, Foundations of Computer Science, 2001. Proceedings. 42nd 
IEEE Symposium on, IEEE, 2001, pp. 600-609. 

Jean B. Lasserre, Global optimization with polynomials and the problem of moments, 
SIAM Journal on Optimization 11 (2001), no. 3, 796-817. 


38 



[MW13a] 
[MW 13b] 

[OJF+12] 

[Par03] 

[Ser77] 

[Sho87] 

[SWPN09] 

[Tul09] 


Zongming Ma and Yihong Wu, Computational barriers in minimax submatrix detection, 

arXiv:1309.5914 (2013). 

Raghu Meka and Avi Wigderson, Association schemes, non-commutative polynomial 
concentration, and sum-of-squares lower bounds for planted clique., Electronic Collo¬ 
quium on Computational Complexity (ECCC), vol. 20, 2013, p. 105. 

Samet Oymak, Amin Jalali, Maryam Fazel, Yonina C Eldar, and Babak Hassibi, 
Simultaneously structured models with application to sparse and low-rank matrices, 

arXiv:1212.3753 (2012). 

Pablo A. Parrilo, Semidefinite programming relaxations for semialgebraic problems, 
Mathematical programming 96 (2003), no. 2, 293-320. 

Jean-Pierre Serre, Linear representations of finite groups, Graduate Texts in Mathe¬ 
matics 42 (1977). 

NZ Shor, Class of global minimum bounds of polynomial functions, Cybernetics and 
Systems Analysis 23 (1987), no. 6, 731-734. 

Andrey A Shabalin, Victor J Weigman, Charles M Perou, and Andrew B Nobel, Finding 
large average submatrices in high dimensional data, The Annals of Applied Statistics 
(2009), 985-1012. 

Madhur Tulsiani, CSP gaps and reductions in the Lasserre hierarchy, Proceedings of the 
forty-first annual ACM symposium on Theory of computing, ACM, 2009, pp. 303-312. 


39 



Figure 

Ribbon class ( 77 ) 
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Table 1: Definition of the different ribbon classes and types. 
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